-
Goik Martin authoredGoik Martin authored
Code owners
Assign users and groups as approvers for specific file changes. Learn more.
sda1.xml 592.14 KiB
<?xml version="1.0" encoding="UTF-8"?>
<part version="5.0" xml:id="sda1" xmlns="http://docbook.org/ns/docbook"
xmlns:xlink="http://www.w3.org/1999/xlink"
xmlns:xi="http://www.w3.org/2001/XInclude"
xmlns:svg="http://www.w3.org/2000/svg"
xmlns:m="http://www.w3.org/1998/Math/MathML"
xmlns:html="http://www.w3.org/1999/xhtml"
xmlns:db="http://docbook.org/ns/docbook">
<info>
<title>Structured Data and Applications 1</title>
<author>
<personname><firstname>Martin</firstname>
<surname>Goik</surname></personname>
<affiliation>
<orgname>http://medieninformatik.hdm-stuttgart.de</orgname>
</affiliation>
</author>
<legalnotice>
<para>Source code available at <uri
xlink:href="https://version.mi.hdm-stuttgart.de/git/GoikLectures">https://version.mi.hdm-stuttgart.de/git/GoikLectures</uri></para>
</legalnotice>
</info>
<chapter xml:id="prerequisites">
<title>Prerequisites</title>
<section xml:id="resources">
<title>Lecture resources</title>
<glosslist>
<glossentry>
<glossterm>Recommended books</glossterm>
<glossdef>
<itemizedlist>
<listitem>
<para><xref linkend="bib_fawcett2012"/></para>
</listitem>
<listitem>
<para><xref linkend="bib_Walmsley02"/></para>
</listitem>
</itemizedlist>
</glossdef>
</glossentry>
<glossentry>
<glossterm>Lecture notes as PDF</glossterm>
<glossdef>
<para><uri
xlink:href="http://www.mi.hdm-stuttgart.de/freedocs/topic/de.hdm_stuttgart.mi.sda1/printversion.pdf">http://www.mi.hdm-stuttgart.de/freedocs/topic/de.hdm_stuttgart.mi.sda1/printversion.pdf</uri></para>
<caution>
<para>Some figures and videos are left blank.</para>
</caution>
</glossdef>
</glossentry>
<glossentry>
<glossterm>Live lecture additions</glossterm>
<glossdef>
<para><link
xlink:href="https://cloud.mi.hdm-stuttgart.de/owncloud/public.php?service=files&t=dae5c53f0a05d6661209527cee45d323">https://cloud.mi.hdm-stuttgart.de/owncloud/public.php?service=files&t=dae5c53f0a05d6661209527cee45d323</link></para>
</glossdef>
</glossentry>
<glossentry>
<glossterm>List of exercises</glossterm>
<glossdef>
<para>The lecture notes contain exercises to be solved by you! A
complete list is available at <uri
xlink:href="http://www.mi.hdm-stuttgart.de/freedocs/topic/de.hdm_stuttgart.mi.sda1/apb.html">http://www.mi.hdm-stuttgart.de/freedocs/topic/de.hdm_stuttgart.mi.sda1/apb.html</uri>.</para>
<para>You may also want to use the corresponding PDF version of
the above table within <filename
xlink:href="http://www.mi.hdm-stuttgart.de/freedocs/topic/de.hdm_stuttgart.mi.sda1/printversion.pdf">printversion.pdf</filename>
to keep track of your personal advances by filling in your
completion status on individual exercises.</para>
</glossdef>
</glossentry>
<glossentry>
<glossterm><link
linkend="gloss_Javadoc"><trademark>Javadoc</trademark></link>
references and source code</glossterm>
<glossdef>
<para>The lecture notes contain a lot of <link
linkend="gloss_Javadoc"><trademark>Javadoc</trademark></link>
references. Most classes appearing within these lecture notes have
<link
linkend="gloss_Javadoc"><trademark>Javadoc</trademark></link>
generated links to the source code as well. For example when
clicking on the class name in
<classname>sda.jdbc.intro.v1.SimpleInsert</classname> you will see
the complete implementation.</para>
</glossdef>
</glossentry>
<glossentry>
<glossterm>Links to animated figures</glossterm>
<glossdef>
<para>The lecture notes' online version contains links to <uri
xlink:href="http://www.mi.hdm-stuttgart.de/freedocs/topic/de.hdm_stuttgart.mi.sda1/jdbcWrite.html">PDF
images</uri>. Clicking on <quote>Animated PDF Version</quote>
takes you to a referenced PDF which in full screen mode of Acrobat
Reader or <trademark>google-chrome</trademark> provides a slide
like animation.</para>
</glossdef>
</glossentry>
<glossentry>
<glossterm><trademark>Virtualbox</trademark> image</glossterm>
<glossdef>
<para>A <productname
xlink:href="https://www.virtualbox.org">Virtualbox</productname>
image is available at <uri
xlink:href="ftp://mirror.mi.hdm-stuttgart.de/ubuntu/VirtualBox/lubuntu.rar">ftp://mirror.mi.hdm-stuttgart.de/ubuntu/VirtualBox/lubuntu.rar</uri>
<link
xlink:href="ftp://mirror.mi.hdm-stuttgart.de/ubuntu/VirtualBox/lubuntu.vdi">ftp://mirror.mi.hdm-stuttgart.de/ubuntu/VirtualBox/lubuntu.vdi</link>.</para>
<caution>
<para>Access from networks being external to
<uri>hdm-stuttgart.de</uri> requires <acronym>VPN</acronym>
access.</para>
</caution>
<para>It contains (hopefully) all related tools from the <link
xlink:href="http://www.mi.hdm-stuttgart.de">CSM</link>
department's lecture room Linux installation:</para>
<itemizedlist>
<listitem>
<para>Eclipse J2EE version with <productname
xlink:href="http://www.eclipse.org/datatools">Database
developer tools</productname>, <productname
xlink:href="http://git-scm.com">git</productname>, <trademark
xlink:href="http://oxygenxml.com">Oxygenxml</trademark>,
<productname
xlink:href="http://testng.org/doc/eclipse.html">TestNG</productname>
and <productname
xlink:href="http://subversion.apache.org/">svn</productname>
plugins installed.</para>
</listitem>
<listitem>
<para>A running <productname
xlink:href="http://www.mysql.com/">Mysql</productname> server
preconfigured with user <quote><code>hdmuser</code></quote>,
password <quote><code>XYZ</code></quote> (<emphasis
role="bold">capital letters!</emphasis>) and database
<quote><code>hdm</code></quote>.</para>
</listitem>
<listitem>
<para><productname
xlink:href="http://www.xmlmind.com/xmleditor">Xmlmind XML
editor</productname> for visually editing technical documents
based on <productname
xlink:href="http://docbook.org/tdg5/index.html">docbook</productname>
or <productname
xlink:href="http://en.wikipedia.org/wiki/Darwin_Information_Typing_Architecture">DITA</productname>.</para>
</listitem>
</itemizedlist>
<caution>
<para>This VM is only accessible from within the <orgname
xlink:href="http://www.hdm-stuttgart.de">HdM</orgname> network.
External downloads require <productname
xlink:href="https://wiki.mi.hdm-stuttgart.de/wiki/VPN">OpenVPN</productname>.</para>
</caution>
<para>The virtual machine is based on the <productname
xlink:href="http://lubuntu.net">Lubuntu</productname> fork of the
<productname
xlink:href="http://www.ubuntu.com">Ubuntu</productname> Linux
distribution for resource saving reasons.</para>
</glossdef>
</glossentry>
<glossentry xml:id="oxygenLicenseKey">
<glossterm><uri>Oxygen Xml Editor</uri> license key</glossterm>
<glossdef>
<para>This is the only software component in this lecture
requiring a license. Your <orgname>HdM</orgname> affiliation
entitles you to use the <productname
xlink:href="http://oxygenxml.com/">Oxygenxml</productname>
software for educational (non-commercial) purposes. The
corresponding key is available at <uri
xlink:href="ftp://mirror.mi.hdm-stuttgart.de/Firmen/Oxygen/Keys">ftp://mirror.mi.hdm-stuttgart.de/Firmen/Oxygen/Keys</uri>.</para>
<para>This license key is compatible both with the standalone and
the eclipse plugin version of the product.</para>
<caution>
<para>The license key's <abbrev
xlink:href="http://en.wikipedia.org/wiki/File_Transfer_Protocol">ftp</abbrev>
URL is only accessible from within the <orgname
xlink:href="http://www.hdm-stuttgart.de">HdM</orgname> network.
External access requires <link
xlink:href="https://wiki.mi.hdm-stuttgart.de/wiki/VPN">Vpn
activation</link>.</para>
</caution>
</glossdef>
</glossentry>
<glossentry>
<glossterm>Source code of lecture resources</glossterm>
<glossdef>
<para>The complete lecture sources are available from <link
xlink:href="https://version.mi.hdm-stuttgart.de/git/GoikLectures">https://version.mi.hdm-stuttgart.de/git/GoikLectures</link>.</para>
<para>You may simply execute <quote><command
xlink:href="http://git-scm.com/">git</command>
<option>clone</option>
<option>https://version.mi.hdm-stuttgart.de/git/GoikLectures</option>
<option>.</option></quote> to check out the master tree.</para>
</glossdef>
</glossentry>
<glossentry>
<glossterm>Source code of exercises and examples</glossterm>
<glossdef>
<para>These sources contain a subdirectory
<filename>ws/eclipse/Jdbc</filename> which can be imported as an
eclipse project. This allows for browsing solutions to the
exercises and executing sample applications. Import into eclipse
works the following way:</para>
<itemizedlist>
<listitem>
<para>When starting eclipse choose
<filename>.../ws/eclipse</filename> as workspace</para>
</listitem>
<listitem>
<para>In eclipse click <quote>File --> Import -->
General --> Existing Projects into Workspace</quote>. After
re-selecting the current workspace
<filename>.../ws/eclipse</filename> the folder
<filename>Jdbc</filename> should be on the list of importable
projects.</para>
<para>Depending on your eclipse installation you may have to
adjust the <link
linkend="gloss_Java"><trademark>Java</trademark></link> system
libraries. Right click on your project root in the package
explorer and choose <quote>Build Path --> Configure
Buildpath</quote>. The <quote>JRE System Library</quote> entry
in the <quote>Libraries</quote> tab may have to be changed to
suit your eclipse's installation needs. You may want to create
a dummy <link
linkend="gloss_Java"><trademark>Java</trademark></link>
project to find the correct setting.</para>
</listitem>
</itemizedlist>
</glossdef>
</glossentry>
</glosslist>
</section>
<section xml:id="tools">
<title>Tools</title>
<para>The subsequent sections describe tools being helpful to
successfully carry out the exercises. These descriptions are suitable
for current Linux/Ubuntu systems. However these tool are available for
<trademark>Windows</trademark> or <trademark>Apple</trademark> systems
as well. For the latter some command line hints may have to be replaced
by using GUI based tools.</para>
<para>You may want to use the <link
xlink:href="ftp://mirror.mi.hdm-stuttgart.de/ubuntu/VirtualBox/lubuntu.vdi">corresponding</link>
<link xlink:href="https://www.virtualbox.org">Virtualbox image</link>
containing a complete system avoiding installation hassles. This should
work well one reasonable current hardware systems.</para>
<section xml:id="eclipse">
<title><productname
xlink:href="http://www.oracle.com/technetwork/java/javase/jdk-7-readme-429198.html">JDK</productname>
and Eclipse</title>
<para>So you like to take the hard way rather than using <link
xlink:href="ftp://mirror.mi.hdm-stuttgart.de/ubuntu/VirtualBox/lubuntu.vdi">the
virtualbox image</link>? Good! Real programmers tend to complicate
things!</para>
<para>The Eclipse IDE will be used as the primary coding tool
especially for <link
linkend="gloss_Java"><trademark>Java</trademark></link> and XML. Users
may use different tools like e.g. <productname
xlink:href="http://netbeans.org">Netbeans</productname> or
<productname
xlink:href="http://www.altova.com/de/xmlspy.html">XML-Spy</productname>.
There are however some caveats:</para>
<itemizedlist>
<listitem>
<para>Certain functionalities may not be provided</para>
</listitem>
<listitem>
<para><orgname>HdM</orgname> staff support in case of troubles
will be limited to coding excluding tool support. In other words:
You are on your own!</para>
</listitem>
</itemizedlist>
<para>Installation of eclipse requires a suitable <link
linkend="gloss_Java"><trademark>Java</trademark></link> Development
Kit.</para>
<caution>
<para>Your<productname
xlink:href="http://www.oracle.com/technetwork/java/javase/jdk-7-readme-429198.html">JDK</productname>
selection may be affected by your system's hardware. On a 64 bit
system you may install either a 32 bit or a 64 bit <productname
xlink:href="http://www.oracle.com/technetwork/java/javase/jdk-7-readme-429198.html">JDK</productname>.
If you subsequently install eclipse you must select the appropriate
32 or 64 Bit version matching your <productname
xlink:href="http://www.oracle.com/technetwork/java/javase/jdk-7-readme-429198.html">JDK</productname>
choice.</para>
</caution>
<para>Due to Oracle's (end-user unfriendly) licensing policy you may
have to install this component manually. For <productname
xlink:href="http://www.ubuntu.com">Ubuntu</productname> and
<productname xlink:href="http://www.debian.org">Debian</productname>
systems a standard (package manager compatible) procedure is being
described at <uri
xlink:href="http://www.webupd8.org/2012/01/install-oracle-java-jdk-7-in-ubuntu-via.html">http://www.webupd8.org/2012/01/install-oracle-java-jdk-7-in-ubuntu-via.html</uri>.
This boils down to (being executed as user root or preceded by
<command>sudo</command> <option>...</option>):</para>
<programlisting language="none">add-apt-repository ppa:webupd8team/java
apt-get update
apt-get install oracle-jdk7-installer</programlisting>
<para>During the installation process you will have to accept Oracle's
license terms. If you do so this information will be cached and not be
asked again for when updating via <command>aptitude
</command><option>update</option>;<command>aptitude</command>
<option>safe-upgrade</option>. After successful installation when
executing <command
xlink:href="http://www.oracle.com/us/technologies/java">java</command>
<option>-version</option> in a shell you should see something similar
to:</para>
<programlisting language="none">goik@goiki:~$ <emphasis role="bold">java -version</emphasis>
java version "1.7.0_07"
Java(TM) SE Runtime Environment (build 1.7.0_07-b10)
Java HotSpot(TM) Server VM (build 23.3-b01, mixed mode)</programlisting>
<para>The Eclipse IDE comes <link
xlink:href="http://www.eclipse.org/downloads">with various
flavours</link> depending on which plugins are already being shipped.
For our purposes the <quote><productname>Eclipse
Classic</productname></quote> <link
linkend="gloss_Java"><trademark>Java</trademark></link> edition is
sufficient. You may however want to install other flavours like
<quote><productname>Eclipse IDE for Java EE
Developers</productname></quote> if you require features beyond this
course's needs. Remember to download the correct 32 or 64 bit version
corresponding to your<productname
xlink:href="http://www.oracle.com/technetwork/java/javase/jdk-7-readme-429198.html">JDK</productname>.</para>
<para>Follow <uri
xlink:href="http://askubuntu.com/questions/26632/how-to-install-eclipse#answer-145018">http://askubuntu.com/questions/26632/how-to-install-eclipse#answer-145018</uri>
to install eclipse on your system.</para>
</section>
<section xml:id="oxygenxmlInstall">
<title><productname
xlink:href="http://oxygenxml.com">Oxygenxml</productname>
plugin</title>
<para>Go to <uri
xlink:href="http://www.oxygenxml.com/download_oxygenxml_developer.html?os=Eclipse">http://www.oxygenxml.com/download_oxygenxml_developer.html?os=Eclipse</uri>.
You may choose between the <quote>Plugin Update site</quote> and
<quote>Plugin zip distribution</quote> installation method. The latter
allows for better long term eclipse plugin management and is being
described at</para>
<para>There are two different ways to install Eclipse plugins:</para>
<itemizedlist>
<listitem>
<para>Use Eclipse's built in Update manager by <link
xlink:href="http://www.oxygenxml.com/download_oxygenxml_developer.html?os=Eclipse#eclipse_install_instructions">defining
a corresponding update site</link>.</para>
</listitem>
<listitem>
<para>Unzip <filename>com.oxygenxml.developer_XYZ.zip</filename>
in a subfolder of <filename>.../eclipse/dropins</filename> and
restart eclipse (as root).</para>
</listitem>
</itemizedlist>
<para>See <xref linkend="oxygenLicenseKey"/> for obtaining a license
key. You may as well install the standalone version of the Oxygen XML
Editor.</para>
</section>
<section xml:id="erMaster">
<title>ERMaster</title>
<para>Visual editing of physical entity relationship diagrams. See
<link xlink:href="http://ermaster.sourceforge.net">installation
instructions</link> on top of an existing eclipse installation.</para>
</section>
<section xml:id="testngInstall">
<title><foreignphrase>TestNG</foreignphrase> plugin</title>
<para>Some exercises require the TestNG plugin to be installed in the
Eclipse IDE. You may proceed in a similar way as in <uri
linkend="oxygenxmlInstall">Oxygenxml</uri>. According to <uri
xlink:href="http://testng.org/doc/eclipse.html#eclipse-installation">http://testng.org/doc/eclipse.html#eclipse-installation</uri>
the Eclipse URL being needed is
<quote>http://beust.com/eclipse</quote>.</para>
</section>
<section xml:id="mysql">
<title><productname
xlink:href="http://www.mysql.com">Mysql</productname> Database
components</title>
<para>We start by installing the <productname
xlink:href="http://www.mysql.com">Mysql</productname> server:</para>
<programlisting language="none">root@goiki:~# aptitude install mysql-server
The following NEW packages will be installed:
libdbd-mysql-perl{a} libdbi-perl{a} libnet-daemon-perl{a} libplrpc-perl{a}
mysql-client-5.5{a} mysql-server-5.5
0 packages upgraded, 6 newly installed, 0 to remove and 0 not upgraded.
Need to get 0 B/17.8 MB of archives. After unpacking 63.2 MB will be used.
Do you want to continue? [Y/n/?]</programlisting>
<para>Hit <keycap>Y - return</keycap> to start. During the
installation you will be asked for the <productname
xlink:href="http://www.mysql.com">Mysql</productname> servers
<quote>root</quote> (Administrator) password:</para>
<programlisting language="none">Package configuration
┌───────────────────────────┤ Configuring mysql-server-5.5 ├────────────────────────────┐
│ While not mandatory, it is highly recommended that you set a password for the MySQL │
│ administrative "root" user. │
│ │
│ If this field is left blank, the password will not be changed. │
│ │
│ New password for the MySQL "root" user: │
│ │
│ ********_____________________________________________________________________________ │
│ │
│ <Ok> │
│ │
└───────────────────────────────────────────────────────────────────────────────────────┘
</programlisting>
<para>This has to be entered twice. Keep a <emphasis
role="bold">permanent</emphasis> record of this entry. Alternatively
set a bookmark to <uri
xlink:href="https://help.ubuntu.com/community/MysqlPasswordReset">https://help.ubuntu.com/community/MysqlPasswordReset</uri>
for later reference *** and don't blame me! ***.</para>
<para>At this point we should be able to connect to our newly
installed Server. We create a database <quote>hdm</quote> to be used
for our exercises:</para>
<programlisting language="none">goik@goiki:~$ mysql -u root -p
Enter password:
Welcome to the MySQL monitor. Commands end with ; or \g.
Your MySQL connection id is 42
Server version: 5.5.24-0ubuntu0.12.04.1 (Ubuntu)
Copyright (c) 2000, 2011, Oracle and/or its affiliates. All rights reserved.
Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.
Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.
mysql> <emphasis role="bold">create database hdm;</emphasis>
Query OK, 1 row affected (0.00 sec)</programlisting>
<para>Following <uri
xlink:href="https://dev.mysql.com/doc/refman/5.5/en/adding-users.html">https://dev.mysql.com/doc/refman/5.5/en/adding-users.html</uri>
we add a new user and grant full access to the newly created
database:</para>
<programlisting language="none">goik@goiki:~$ mysql -u root -p
Enter password:
...
mysql> CREATE USER 'hdmuser'@'localhost' IDENTIFIED BY 'XYZ';
mysql> use hdm;
mysql> GRANT ALL PRIVILEGES ON *.* TO 'hdmuser'@'localhost' WITH GRANT OPTION;
mysql> FLUSH PRIVILEGES;</programlisting>
<para>The next step is optional. The <productname
xlink:href="http://www.ubuntu.com">Ubuntu</productname> <productname
xlink:href="http://www.mysql.com">Mysql</productname> server default
configuration allows connections only via <varname>loopback</varname>
interface i.e. <varname>localhost</varname>. If you want your
<productname xlink:href="http://www.mysql.com">Mysql</productname>
server to listen to the external network interface comment out the
bind-address parameter in
<filename>/etc/mysql/my.cnf</filename>:</para>
<programlisting language="none"># Instead of skip-networking the default is now to listen only on
# localhost which is more compatible and is not less secure.
# <emphasis role="bold">bind-address = 127.0.0.1</emphasis></programlisting>
<para>Since we are dealing with <link
linkend="gloss_Java"><trademark>Java</trademark></link> a <trademark
xlink:href="http://electronics.zibb.com/trademark/jdbc/29545026">JDBC</trademark>
driver is needed to connect Applications to our <productname
xlink:href="http://www.mysql.com">Mysql</productname> server:</para>
<programlisting language="none">root@goiki:~# aptitude install libmysql-java</programlisting>
<para>This provides the file
/usr/share/java/mysql-connector-java-5.1.16.jar and two symbolic
links:</para>
<programlisting language="none">goik@goiki:~$ cd /usr/share/java
goik@goiki:/usr/share/java$ ls -al mysql*
-rw-r--r-- 1 ... 2011 <emphasis role="bold">mysql-connector-java-5.1.16.jar</emphasis>
lrwxrwxrwx 1 ... 2011 <emphasis role="bold">mysql-connector-java.jar -> mysql-connector-java-5.1.16.jar</emphasis>
lrwxrwxrwx 1 ... 2011 <emphasis role="bold">mysql.jar -> mysql-connector-java.jar</emphasis></programlisting>
</section>
</section>
<section xml:id="lectureNotes">
<title>Lecture related resources</title>
<para>The sources for lecture notes and exercises are available from the
<orgname xlink:href="http://www.mi.hdm-stuttgart.de">MIB</orgname>
<productname xlink:href="http://git-scm.com">git</productname>
repository:</para>
<para><uri
xlink:href="https://version.mi.hdm-stuttgart.de/git/GoikLectures">https://version.mi.hdm-stuttgart.de/git/GoikLectures</uri></para>
<para>Check-out is straightforward:</para>
<programlisting language="none">goik@goiki:~$ mkdir StructuredData;cd StructuredData
goik@goiki:~/StructuredData$ git clone https://version.mi.hdm-stuttgart.de/git/GoikLectures .
Cloning into '.'...
remote: Counting objects: 694, done
...
Resolving deltas: 100% (296/296), done.</programlisting>
<para>After checkout an eclipse workspace holding the complete example
source code becomes visible:</para>
<programlisting language="none">goik@goiki:~/StructuredData$ cd ws/eclipse
goik@goiki:~/StructuredData/ws/eclipse$ ls -al
insgesamt 16
drwxr-xr-x 3 goik fb1prof 4096 Nov 8 22:04 .
drwxr-xr-x 4 goik fb1prof 4096 Nov 8 22:04 ..
-rw-r--r-- 1 goik fb1prof 11 Nov 8 22:04 .gitignore
<emphasis role="bold">drwxr-xr-x 6 goik fb1prof 4096 Nov 8 22:04 Jdbc</emphasis></programlisting>
<para>The subdirectory <filename>Jdbc</filename> can be imported as an
eclipse project via File --> import --> General --> Existing
Projects into workspace. This should enable each participant to browse
and execute the examples being provided in the lecture notes. It also
contains the a <productname
xlink:href="http://www.mysql.com">Mysql</productname> driver in
Jdbc/lib/mysql-connector-java-5.1.16.jar being required to set up a
<trademark
xlink:href="http://electronics.zibb.com/trademark/jdbc/29545026">JDBC</trademark>
connection.</para>
</section>
<section xml:id="repeatRelational">
<title>Some notes on relational databases</title>
<qandaset defaultlabel="qanda" xml:id="airlineRelationalSchema">
<title>Airlines, airports and flights</title>
<qandadiv>
<qandaentry>
<question>
<para>Implement a relational schema describing airlines,
flights, airports and their respective relationships:</para>
<itemizedlist>
<listitem>
<para>Airline:</para>
<itemizedlist>
<listitem>
<para>An informal unique name like e.g.
<quote>Lufthansa</quote>.</para>
</listitem>
<listitem>
<para>A unique <link
xlink:href="http://en.wikipedia.org/wiki/List_of_airline_codes">ICAO
abbreviation</link>.</para>
</listitem>
</itemizedlist>
</listitem>
<listitem>
<para>Destination</para>
<itemizedlist>
<listitem>
<para>Full name like <quote>Frankfurt am Main
International</quote></para>
</listitem>
<listitem>
<para>World airport code like <quote>FRA</quote>.</para>
</listitem>
</itemizedlist>
</listitem>
<listitem>
<para>Flight</para>
<itemizedlist>
<listitem>
<para>A unique flight number e.g. LH 4234</para>
</listitem>
<listitem>
<para>The <quote>owning</quote> airline.</para>
</listitem>
<listitem>
<para>originating airport</para>
</listitem>
<listitem>
<para>destination airport</para>
</listitem>
<listitem>
<para>Constraint: origin and destination must differ.
Hint: <productname>Mysql</productname> provides a
syntactical means to implement this constraint. It will
however not be enforced at runtime. Database vendors
like Oracle, IBM/DB2, <productname>Sybase</productname>,
<productname>Informix</productname>
<abbrev>etc.</abbrev> support this type of runtime
integrity constraint enforcement.</para>
</listitem>
</itemizedlist>
</listitem>
</itemizedlist>
<para>Provide surrogate keys for all entities and provide names
for all constraints (<abbrev>e.g.</abbrev> defining
<code>CONSTRAINT _PK_XYZ PRIMARY KEY(...)</code> etc. ).</para>
</question>
<answer>
<programlisting language="sql">CREATE Table Airline (
id INT NOT NULL
,name CHAR(20) NOT NULL
,airlineCode CHAR(5) NOT NULL
,CONSTRAINT _PK_Airline_id PRIMARY KEY(id)
,CONSTRAINT _UN_Airline_name UNIQUE(name)
,CONSTRAINT _UN_Airline_airlineCode UNIQUE(airlineCode)
);
CREATE TABLE Destination (
id INT NOT NULL
,fullName CHAR(20) NOT NULL
,airportCode CHAR(5)
,CONSTRAINT _PK_Destination_id PRIMARY KEY(id)
,CONSTRAINT _UN_Destination_airportCode UNIQUE(airportCode)
);
CREATE TABLE Flight (
id INT NOT NULL
,flightNumber CHAR(10) NOT NULL
,airline INT NOT NULL REFERENCES Airline
,origin int NOT NULL REFERENCES Destination
,destination int NOT NULL REFERENCES Destination
-- For yet unknown reasons the following alternative MySQL 5.1 syntax compatible
-- statements fail with message 'Cannot add foreign key constraint":
-- ,CONSTRAINT _FK_Flight_airline FOREIGN KEY(airline) REFERENCES Airline
-- ,CONSTRAINT _FK_Flight_origin FOREIGN KEY(origin) REFERENCES Destination
-- ,CONSTRAINT _FK_Flight_destination FOREIGN KEY(destination) REFERENCES Destination
,CONSTRAINT _PK_Flight_id UNIQUE(id)
,CONSTRAINT _UN_Flight_flightNumber UNIQUE(flightNumber)
,CONSTRAINT _CK_Flight_origin_destination CHECK(NOT(origin = destination))
);</programlisting>
</answer>
</qandaentry>
</qandadiv>
</qandaset>
</section>
<section xml:id="toolingConfigJdbc">
<title>Tooling: Configuring and using the <link
xlink:href="http://www.eclipse.org/datatools">Eclipse database
development</link> plugin</title>
<para>For some basic SQL communications the Eclipse environment offers a
standard plugin (Database development). Establishing connections to a
specific database server generally requires prior installation of a
<trademark
xlink:href="http://electronics.zibb.com/trademark/jdbc/29545026">JDBC</trademark>
driver on the client side as being shown in the following video:</para>
<figure xml:id="figureConfigJdbcDriver">
<title>Adding a <trademark
xlink:href="http://electronics.zibb.com/trademark/jdbc/29545026">JDBC</trademark>
Driver for <productname
xlink:href="http://www.mysql.com">Mysql</productname> to the database
plugin.</title>
<mediaobject>
<videoobject>
<videodata fileref="Ref/Video/jdbcDriverConfig.mp4"/>
</videoobject>
</mediaobject>
</figure>
<para>During the exercises the eclipse database development perspective
may be used to browse and structure SQL tables and data. The following
video demonstrates the configuration of a <trademark
xlink:href="http://electronics.zibb.com/trademark/jdbc/29545026">JDBC</trademark>
connection to a local (<varname>localhost</varname> network interface)
database server. With respect to the introduction given in <xref
linkend="mysql"/> we assume the existence of a database <code>hdm</code>
and a corresponding account <quote>hdmuser</quote> and password
<quote><code>XYZ</code></quote> (<emphasis role="bold">capital
letters!</emphasis>) on our database server.</para>
<figure xml:id="figureConfigJdbcConnection">
<title>Configuring a <trademark
xlink:href="http://electronics.zibb.com/trademark/jdbc/29545026">JDBC</trademark>
connection to a (local) <productname
xlink:href="http://www.mysql.com">Mysql</productname> database
server.</title>
<mediaobject>
<videoobject>
<videodata fileref="Ref/Video/jdbcConnection.mp4"/>
</videoobject>
</mediaobject>
</figure>
<para>We are now ready to communicate with our database server. The last
video in this section shows some basic SQL tasks:</para>
<figure xml:id="figureEclipseBasicSql">
<title>Executing SQL statements, browsing schema and retrieving
data</title>
<mediaobject>
<videoobject>
<videodata fileref="Ref/Video/eclipseBasicSql.mp4"/>
</videoobject>
</mediaobject>
</figure>
</section>
</chapter>
<chapter xml:id="xmlIntro">
<title>Introduction to XML</title>
<section xml:id="xmlBasic">
<title>The XML industry standard</title>
<para>A short question might be: <quote>What is XML?</quote> An answer
might be: The acronym XML stands for
<quote>E<emphasis>x</emphasis>tensible <emphasis>M</emphasis>arkup
<emphasis>L</emphasis><foreignphrase>anguage</foreignphrase></quote> and
is an industry standard being published by the W3C standardization
organization. Like other industry software standards talking about XML
leads to talk about XML based software: Applications and frameworks
supplying added values to software implementors and enhancing data
exchange between applications.</para>
<para>Many readers are already familiar with XML without explicitly
referring to the standard itself: The world wide web's
<foreignphrase>lingua franca</foreignphrase> HTML has been ported to an
XML dialect forming the <link
xlink:href="http://www.w3.org/MarkUp">XHTML</link> Standard. The idea
behind this standard is to distinguish between an abstract markup
language and rendered results being generated from so called document
instances by a browser:</para>
<figure xml:id="renderXhtmlMarkup">
<title>Rendering XHTML markup</title>
<mediaobject>
<imageobject>
<imagedata fileref="Ref/Fig/xhtml.fig"/>
</imageobject>
</mediaobject>
</figure>
<para>Xhtml is actually a good example to illustrate the tree like,
hierarchical structure of XML documents:</para>
<figure xml:id="xhtmlTree">
<title>Xhtml tree structure</title>
<mediaobject>
<imageobject>
<imagedata fileref="Ref/Fig/xhtmlexample.fig" scale="65"/>
</imageobject>
</mediaobject>
</figure>
<para>We may extend this example by representing a mathematical formula
via a standard called <link
xlink:href="http://www.w3.org/Math">Mathml</link>:</para>
<figure xml:id="mathmlExample">
<title>A formula in <link
xlink:href="http://www.w3.org/Math">MathML</link>
representation.</title>
<mediaobject>
<imageobject>
<imagedata fileref="Ref/Fig/sqrtrender.fig"/>
</imageobject>
</mediaobject>
</figure>
<para>Again we observe a similar situation: A database like
<emphasis>representation</emphasis> of a formula on the left and a
<emphasis>rendered</emphasis> version on the right. Regarding XML we
have:</para>
<itemizedlist>
<listitem>
<para>The <link xlink:href="http://www.w3.org/Math">MathML</link>
standard intended to describe mathematical formulas. The standard
defines a set of <emphasis>tags</emphasis> like e.g. <tag
class="starttag">math:msqrt</tag> with well-defined semantics
regarding permitted attribute values and nesting rules.</para>
</listitem>
<listitem>
<para>Informal descriptions of formatting expectations.</para>
</listitem>
<listitem>
<para>Software transforming an XML formula representation into
visible or printable output. In other words: A rendering
engine.</para>
</listitem>
</itemizedlist>
<para>XML documents may also be regarded as a persistence mechanism to
represent and store data. Similarities to Relational Database Systems
exist. A RDBMS
(<emphasis>R</emphasis><foreignphrase>elational</foreignphrase>
<emphasis>D</emphasis><foreignphrase>atabase</foreignphrase>
<emphasis>M</emphasis><foreignphrase>anagement</foreignphrase>
<emphasis>S</emphasis><foreignphrase>ystem</foreignphrase>) is typically
capable to hold Tera bytes of data being organized in tables. The
arrangement of data may be subject to various constraints like
candidate- or foreign key rules. With respect to both end users and
software developers a RDBMS itself is a building block in a complete
solution. We need an application on top of it acting as a user interface
to the data being contained.</para>
<para>In contrast to a RDBMS XML allows data to be organized
hierarchically. The <link
xlink:href="http://www.w3.org/Math">MathML</link> representation given
in <xref linkend="mathmlExample"/> may be graphically visualized:</para>
<figure xml:id="mathmltree">
<title>A tree graph representation of the <link
xlink:href="http://www.w3.org/Math">MathML</link> example given
before.</title>
<mediaobject>
<imageobject>
<imagedata fileref="Ref/Fig/sqrtree.fig"/>
</imageobject>
</mediaobject>
</figure>
<para>CAD applications may user XML documents as a representation of
graphical primitives:</para>
<informalfigure>
<mediaobject>
<imageobject>
<imagedata fileref="Ref/Fig/attributes.fig" scale="65"/>
</imageobject>
</mediaobject>
</informalfigure>
<para>Of course RDBMS also allow the representation of tree like
structures or arbitrary graphs. But these have to be modelled by using
foreign key constraints since relational tables themselves have a
<quote>flat</quote> structure. Some RDBMS vendors provide extensions to
the SQL standard which allow <quote>native</quote> representations of
<link linkend="gloss_XML"><abbrev>XML</abbrev></link> documents.</para>
</section>
<section xml:id="xmlHtml">
<title>Well formed XML documents</title>
<para>The general structure of an <link
linkend="gloss_XML"><abbrev>XML</abbrev></link> document is as
follows:</para>
<figure xml:id="xmlbase">
<title><link linkend="gloss_XML"><abbrev>XML</abbrev></link> basic
structure</title>
<mediaobject>
<imageobject>
<imagedata fileref="Ref/Fig/xmlbase.fig" scale="65"/>
</imageobject>
</mediaobject>
</figure>
<para>We explore a simple XML document representing messages like
E-mails:</para>
<figure xml:id="memoWellFormed">
<title>The representation of a short message.</title>
<programlisting language="none"><?xml<co
xml:id="first_xml_code_magic"/> version="1.0"<co
xml:id="first_xml_code_version"/> encoding="UTF-8"<co
xml:id="first_xml_code_encoding"/>?>
<memo><co xml:id="first_xml_code_topelement"/>
<from>M. Goik</from><co xml:id="first_xml_code_from"/>
<to>B. King</to>
<to>A. June</to>
<subject>Best whishes</subject>
<content>Hi all, congratulations to your splendid party</content>
</memo></programlisting>
</figure>
<calloutlist>
<callout arearefs="first_xml_code_magic">
<para>The very first characters <code><?xml</code> may be
regarded as a <link
xlink:href="http://en.wikipedia.org/wiki/Magic_number_(programming)">magic
number string</link> being used as a format indicator which allows
to distinguish between different file types i.e. GIF, JPEG, HTML and
so on.</para>
</callout>
<callout arearefs="first_xml_code_version">
<para>The <code>version="1.0"</code> attribute tells us that all
subsequent lines will conform to the <link
xlink:href="http://www.w3.org/TR/xml">XML</link> standard of version
1.0. This way a document can express its conformance to the version
1.0 standard even if in the future this standard evolves to a higher
version e.g. <code>version="2.1"</code>.</para>
</callout>
<callout arearefs="first_xml_code_encoding">
<para>The attribute <code>encoding="UTF-8"</code> tells us that all
text in the current document uses <link
xlink:href="http://unicode.org">Unicode</link> encoding. <link
xlink:href="http://unicode.org">Unicode</link> is a widely accepted
industry standard for font encoding. Thus European, Cyrillic and
most Asian font codes are allowed to be used in documents
<emphasis>simultaneously</emphasis>. Other encodings may limit the
set of allowed characters, e.g. <code>encoding="ISO-8859-1"</code>
will only allow characters belonging to western European languages.
However a system also needs to have the corresponding fonts (e.g.
TrueType) being installed in order to render the document
appropriately. A document containing Chinese characters is of no use
if the underlying rendering system lacks e.g. a set of Chinese True
Type fonts.</para>
</callout>
<callout arearefs="first_xml_code_topelement">
<para>An XML document has exactly one top level
<emphasis>node</emphasis>. In contrast to the HTML standard these
nodes are commonly called elements rather than tags. In this example
the top level (root) element is <tag
class="starttag">memo</tag>.</para>
</callout>
<callout arearefs="first_xml_code_from">
<para>Each XML element like <tag class="starttag">from</tag> has a
corresponding counterpart <tag class="endtag">from</tag>. In terms
of XML we say each element being opened has to be closed. In
conjunction with the precedent point this is equivalent to the fact
that each XML document represents a tree structure as being shown in
the <link linkend="mathmltree">tree graph</link>
representation.</para>
</callout>
</calloutlist>
<para>As with the introductory formula example this representation
itself is of limited usefulness: In an office environment we need a
rendered version being given either as print or as some online format
like E-Mail or HTML.</para>
<para>From a software developer's point of view we may use a piece of
software called a <emphasis>parser</emphasis> to test the document's
standard conformance. At the MI department we may simply invoke
<userinput><command>xmlparse</command> message.xml</userinput> to start
a check:</para>
<programlisting language="none"><errortext>goik>xmlparse wellformed.xml
Parsing was successful</errortext></programlisting>
<para>Various XML related plugins are supplied for the <productname
xlink:href="http://eclipse.org">eclipse platform</productname> like the
<productname xlink:href="http://oxygenxml.com">Oxygen
software</productname> supplying <quote>life</quote> conformance
checking while editing XML documents. Now we test our assumptions by
violating some of the rules stated before. We deliberately omit the
closing element <tag class="endtag">from</tag>:</para>
<figure xml:id="omitFrom">
<title>An invalid XML document due to the omission of <tag
class="endtag">from</tag>.</title>
<programlisting language="none"><?xml version="1.0" encoding="UTF-8"?>
<memo>
<from>M. Goik <co xml:id="omitFromMissingElement"/>
<to>B. King</to>
<to>A. June</to>
<subject>Best whishes</subject>
<content>Hi all, congratulations to your splendid party</content>
</memo></programlisting>
<calloutlist>
<callout arearefs="omitFromMissingElement">
<para>The opening element <tag class="starttag">from</tag> is not
terminated by <tag class="endtag">from</tag>.</para>
</callout>
</calloutlist>
</figure>
<para>Consequently the parser's output reads:</para>
<programlisting language="none"><errortext>goik>xmlparse omitfrom.xml
file:///ma/goik/workspace/Vorlesungen/Input/Memo/omitfrom.xml:8:3:
fatal error org.xml.sax.SAXParseException: The element type "from"
must be terminated by the matching end-tag "</from>". parsing error</errortext></programlisting>
<para>Experienced HTML authors may be confused: In fact HTML is not an
XML standard. Instead HTML belongs to the set of SGML applications. SGML
is a much older standard namely the <emphasis>Standard Generalized
Markup Language</emphasis>.</para>
<para>Even if every XML element has a closing counterpart the resulting
XML may be invalid:</para>
<programlisting language="none"><?xml version="1.0" encoding="UTF-8"?>
<memo>
<from>M. Goik<to>B. King</from></to>
<to>A. June</to>
<subject>Best whishes</subject>
<content>Hi all, congratulations to your splendid party</content>
</memo></programlisting>
<para>The parser echoes:</para>
<programlisting language="none"><computeroutput>file:///ma/goik/workspace/Vorlesungen/Input/Memo/nonest.xml:3:29:
fatal error org.xml.sax.SAXParseException: The element type "to" must be
terminated by the matching end-tag "</to>". parsing error</computeroutput></programlisting>
<para>This type of error is caused by so called improper nesting of
elements: The element <tag class="starttag">from</tag>is closed before
the <quote>inner</quote> element <tag class="starttag">to</tag> has been
closed. Actually this violates the expressibility of XML documents as a
tree like structure. The situation may be resolved by choosing:</para>
<programlisting language="none">...<from>M. Goik<to>B. King</to></from>...</programlisting>
<para>We provide two examples illustrating proper and improper nesting
of XML documents:</para>
<figure xml:id="fig_nestingProper">
<title>Proper nesting of XML elements</title>
<mediaobject>
<imageobject>
<imagedata fileref="Ref/Fig/propernest.fig" scale="65"/>
</imageobject>
</mediaobject>
</figure>
<para>The following example violates proper nesting constraint and thus
does not provide an XML document:</para>
<figure xml:id="fig_improperNest">
<title>Improperly nested elements</title>
<mediaobject>
<imageobject>
<imagedata fileref="Ref/Fig/impropernest.fig"/>
</imageobject>
</mediaobject>
</figure>
<!-- goik:later
<para>An animation showing the usage of the Oxygen plug in for the
examples given above can be found <uri
xlink:href="src/viewlet/wellformed/wellformed_viewlet_swf.html">here</uri>.</para>
-->
<para>XML elements may have so called attributes like <tag
class="attribute">date</tag> in the following example:</para>
<figure xml:id="memoWellAttrib">
<title>An XML document with attributes.</title>
<programlisting language="none"><?xml version="1.0" encoding="UTF-8"?>
<memo date="10.02.2006" priority="high">
<from>M. Goik</from>
<to>B. King</to>
<to>A. June</to>
<subject>Best whishes</subject>
<content>Hi all, congratulations to your splendid party</content>
</memo></programlisting>
</figure>
<para>The conformance of a XML document with the following rules may be
verified by invoking a parser:</para>
<itemizedlist>
<listitem>
<para>Within the <emphasis>scope</emphasis> of a given element an
attribute name must be unique. In the example above one may not
define a second attribute <varname>date="..."</varname> within the
same element <memo ... >. This reflects the usual programming
language semantics of attributes: In a <link
linkend="gloss_Java"><trademark>Java</trademark></link> class an
attribute is represented by an unique identifier and thus cannot
appear twice.</para>
</listitem>
<listitem>
<para>An attribute value must be enclosed either in single (') or
double (") quotes. This is different from the HTML standard which
allows attribute values without quotes provided the given attribute
value does not give rise to ambiguities. For example <tag
class="starttag">td align=left</tag> is allowed since the attribute
value <tag class="attvalue">left</tag> does not contain any spaces
thus allowing a parser to recognize the end of the value's
definition.</para>
</listitem>
</itemizedlist>
<qandaset defaultlabel="qanda" xml:id="example_memoAttribTree">
<title>A graphical representation of a memo.</title>
<qandadiv>
<qandaentry>
<question>
<para>Draw a graphical representation similar as in <xref
linkend="mathmltree"/> of the memo document being given in <xref
linkend="memoWellAttrib"/>.</para>
</question>
<answer>
<para>The <link linkend="memoWellAttrib">memo document's</link>
structure may be visualized as:</para>
<informalfigure xml:id="memotreeFigure">
<para>A graphical representation of <xref
linkend="memoWellAttrib"/>:</para>
<informalfigure xml:id="memotreeFigureFalse">
<mediaobject>
<imageobject>
<imagedata fileref="Ref/Fig/memotree.fig"/>
</imageobject>
</mediaobject>
</informalfigure>
<para>The sequence of <emphasis>element</emphasis> child nodes
is important in XML and has to be preserved. Only the order of
the two attributes <tag class="attribute">date</tag> and <tag
class="attribute">priority</tag> is undefined: They actually
belong to the <tag class="starttag">memo</tag> node serving as
a dictionary with the attribute names being the keys and the
attribute values being the values of the dictionary.</para>
</informalfigure>
</answer>
</qandaentry>
<qandaentry xml:id="example_attribInQuotes">
<question>
<label>Attributes and quotes</label>
<para>As stated before XML attributes have to be enclosed in
single or double quotes. Construct an XML document with mixed
quotes like <code><date day="monday'></code>. How does the
parser react? Find the corresponding syntax definition of legal
attribute values in the <link
xlink:href="http://www.w3.org/TR/xml">XML standard W3C
Recommendation</link>.</para>
</question>
<answer>
<para>The parser flags a mixture of single and double quotes for
a given attribute as an error. The XML standard <link
xlink:href="http://www.w3.org/TR/xml#NT-AttValue">defines</link>
the syntax of attribute values: An attribute value has to be
enclosed <emphasis>either</emphasis> in two single
<emphasis>or</emphasis> in two double quotes as being defined in
<uri
xlink:href="http://www.w3.org/TR/xml/#NT-AttValue">http://www.w3.org/TR/xml/#NT-AttValue</uri>.</para>
</answer>
</qandaentry>
<qandaentry xml:id="quoteInAttributValue">
<question>
<label>Quotes as part of an attributes value?</label>
<para>Single and double quote are used to delimit an attribute
value. May quotes appear themselves as part of an at tribute's
value, e.g. like in a person's name <code>Gary "King"
Mandelson</code>?</para>
</question>
<answer>
<para>Attribute values may contain double quotes if the
attributes value is enclosed in single quotes and vice versa. As
a limitation the value of an an attribute may not contain single
quotes and double quotes at the same time:</para>
<informalfigure xml:id="exampleSingleDoubleQuotes">
<para>Quotes as part of attribute values.</para>
<programlisting language="none"><?xml version="1.0" encoding="UTF-8"?>
<test>
<person name='Gary "King" Mandelson'/> <!-- o.k. -->
<person name="Gary 'King' Mandelson"/> <!-- o.k. -->
<person name="Gary 'King 'S.' "Mandelson"'/> <!-- oops! -->
</test></programlisting>
</informalfigure>
</answer>
</qandaentry>
</qandadiv>
</qandaset>
<para>Some constraints being imposed on XML documents by the standard
defined so far may be summarized as:</para>
<itemizedlist>
<listitem>
<para>A XML documents requires to have exactly one top level
element.</para>
</listitem>
<listitem>
<para>Elements have to be properly nested. An element must not be
closed if an <quote>inner</quote> Element is still open.</para>
</listitem>
<listitem>
<para>Attribute names within a given Element must be unique.</para>
</listitem>
<listitem>
<para>Attribute values <emphasis>must</emphasis> be quoted
correctly.</para>
</listitem>
</itemizedlist>
<para>The very last rule shows one of several differences to the HTML
Standard: In HTML a lot of elements don't have to be closed. For example
paragraphs (<tag class="starttag">p</tag>) or images (<tag
class="starttag">img src='foo.gif'</tag>) don't have to be closed
explicitly. This is due to the fact that HTML used to be defined in
accordance with the older <emphasis><emphasis
role="bold">S</emphasis>tandard <emphasis
role="bold">G</emphasis>eneralized <emphasis
role="bold">M</emphasis>arkup <emphasis
role="bold">L</emphasis>anguage</emphasis> (SGML) Standard.</para>
<para>These constraints are part of the definition of a <link
xlink:href="http://www.w3.org/TR/xml#sec-well-formed">well formed
document</link>. The specification imposes additional constraints for a
document to be well-formed.</para>
</section>
</chapter>
<chapter xml:id="dtd">
<title>Beyond well- formedness</title>
<section xml:id="motivationDdt">
<title>Motivation</title>
<para>So far we are able to create XML documents containing
hierarchically structured data. We may nest elements and thus create
tree structures of arbitrary depth. The only restrictions being imposed
by the XML standard are the constraints of well - formedness. For many
purposes in software development this is not sufficient.</para>
<para>A company named <productname>Softmail</productname> might
implement an email system which uses <link
linkend="memoWellAttrib">memo</link> document files as low level data
representation serving as a persistence layer. Now a second company
named <productname>Hardmail</productname> wants to integrate mails
generated by <productname>Softmail</productname>'s system into its own
business product. The <productname>Hardmail</productname> software
developers might <emphasis>infer</emphasis> the logical structure of
<productname>Softmail</productname>'s email representation but the
following problems arise:</para>
<itemizedlist>
<listitem>
<para>The logical structure will in practice become more complex:
E-mails may contain attachments leading to multi part messages.
Additional header information is required for standard Internet mail
compliance. This adds additional complexity to the XML structure
being mandatory for data representation. Relying only on
well-formedness the specification of an internal E-mail format can
only be achieved <emphasis>informally</emphasis>. Thus a rule like
<quote>Each E-mail must have a subject</quote> may be written down
in the specification. A software developer will code these rules but
probably make mistakes as the set of rules grows.</para>
<para>In contrast a RDBMS based solution offers to solve such
problems in a declarative manner: A developer may use a <code>NOT
NULL</code> constraint on a subject attribute of type
<code>VARCHAR</code> thus inhibiting empty subjects.</para>
</listitem>
<listitem>
<para>As <productname>Softmail</productname>'s product evolves its
internal E-mail XML format is subject to change due to functional
extensions and possibly bug fixes both giving rise to
interoperability problems.</para>
</listitem>
</itemizedlist>
<para>Generally speaking well formed XML documents lack grammar
constraints as being available for programming languages. In case of
RDBMS developers can impose primary-, foreign and <code>CHECK</code>
constraints in a <emphasis>declarative</emphasis> manner rather than
hard coding them into their applications (A solution bad programmers are
in favour of though...). Various XML standards exist for declarative
constraint definitions namely:</para>
<itemizedlist>
<listitem>
<para>DTDs</para>
</listitem>
<listitem>
<para><link xlink:href="http://www.w3.org/XML/Schema">XML
Schema</link></para>
</listitem>
<listitem>
<para><link
xlink:href="http://www.relaxng.org">RelaxNG</link></para>
</listitem>
</itemizedlist>
</section>
<section xml:id="dtdBasic">
<title>XML Schema</title>
<section xml:id="dtdFirstExample">
<title>Structural descriptions for documents</title>
<para>As an example we choose documents of type
<emphasis>memo</emphasis> as a starting point. Documents like the
example from <xref linkend="memoWellAttrib"/> may be
<emphasis>informally</emphasis> described to be a sequence of the
following mandatory items:</para>
<figure xml:id="figure_memo_informalconstraints">
<title>Informal constraints on <tag class="element">memo</tag>
document instances</title>
<itemizedlist>
<listitem>
<para><emphasis>Exactly one</emphasis> sender.</para>
</listitem>
<listitem>
<para><emphasis>One or more</emphasis> recipients.</para>
</listitem>
<listitem>
<para>Subject</para>
</listitem>
<listitem>
<para>Content</para>
</listitem>
</itemizedlist>
<para>In addition we have:</para>
<itemizedlist>
<listitem>
<para>A date string <emphasis>must</emphasis> be supplied</para>
</listitem>
<listitem>
<para>A priority <emphasis>may</emphasis> be supplied with
allowed values to be chosen from the set of values <tag
class="attvalue">low</tag>, <tag class="attvalue">medium</tag>
or <tag class="attvalue">high</tag>.</para>
</listitem>
</itemizedlist>
</figure>
<para>All these fields contain ordinary text to be filled in by a user
and shall appear exactly in the defined order. For simplicity we do
not care about email address syntax rules being described in <link
xlink:href="http://www.w3.org/Protocols/rfc822">RFC based address
schemes</link>. We will see how the <emphasis>constraints</emphasis>
mentioned above can be modelled in XML by an extension to the concept
of well formed documents.</para>
</section>
<section xml:id="section_memo_machinereadable">
<title>A machine readable description</title>
<para>We now introduce an example of an XML schema. It allows for the
specification of additional constraints to both element nodes and
their attributes. Our set of <link
linkend="figure_memo_informalconstraints" revision="">informal
constraints</link> on memo documents may be expressed as:</para>
<figure xml:id="figure_memo_dtd">
<title>A schema to describe memo documents.</title>
<programlisting language="none"><xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"
xmlns:vc="http://www.w3.org/2007/XMLSchema-versioning" elementFormDefault="qualified"
vc:minVersion="1.0" vc:maxVersion="1.1">
<xs:element name="memo">
<xs:complexType>
<xs:sequence> <co xml:id="memodtd_memodef"/>
<xs:element name="from" type="xs:string"/> <co
xml:id="memodtd_elem_from"/>
<xs:element name="to" minOccurs="1" maxOccurs="unbounded" type="xs:string"/>
<xs:element name="subject" type="xs:string"/>
<xs:element name="content" type="xs:string"/>
</xs:sequence>
<xs:attribute name="date" type="xs:date" use="required"/> <co
xml:id="memodtd_memo_attribs"/>
<xs:attribute name="priority" type="Priority" use="optional"/>
</xs:complexType>
</xs:element>
<xs:simpleType name="Priority">
<xs:restriction base="xs:string">
<xs:enumeration value="low"/>
<xs:enumeration value="medium"/>
<xs:enumeration value="high"/>
</xs:restriction>
</xs:simpleType>
</xs:schema></programlisting>
<calloutlist>
<callout arearefs="memodtd_memodef">
<para>A <tag class="element">memo</tag> consists of a sender, at
least one recipient, a subject and content.</para>
</callout>
<callout arearefs="memodtd_memo_attribs">
<para>A <tag class="element">memo</tag> has got one required
attribute <varname>date</varname> and an optional attribute
<varname>priority</varname> being restricted to the three
allowed values <tag class="attvalue">low</tag>, <tag
class="attvalue">medium</tag> and <tag
class="attvalue">high</tag> being defined by a separate <tag
class="starttag">xs:simpleType</tag> directive.</para>
</callout>
<callout arearefs="memodtd_elem_from">
<para>A <tag class="starttag">from</tag> element consists of
ordinary text. This disallows XML markup. For example
<code><from>Smith & partner</from></code> is
disallowed since XML uses the ampersand (&) to denote the
beginning of an entity like <tag class="genentity">auml</tag>
for the German a-umlaut (ä). The correct form is
<code><from>Smith &amp; partner</from></code>
using the predefined entity <tag class="genentity">amp</tag> as
an escape sequence for the ampersand.</para>
<para><code>type="xs:string"</code> is a built in XML Schema
type representing a restricted version of ordinary strings.
Without digging into details a <code>xs:string</code> string
must not contain any markup code like e.g. <tag
class="starttag">msqrt</tag>. This ensures that a string does
not interfere with the document's XML markup.</para>
</callout>
</calloutlist>
</figure>
<para>We notice our schema's syntax itself is an XML document.</para>
<para>From the viewpoint of software modeling an XML Schema instance
is a <emphasis>schema</emphasis> describing the syntax of a class of
XML document instances adhering to it. In the context of XML
technologies <link xlink:href="http://www.w3.org/XML/Schema">XML
Schema</link> is one of several language alternatives which allow for
XML document structure descriptions.</para>
<para>Readers being familiar with <abbrev
xlink:href="http://en.wikipedia.org/wiki/Backus-Naur_form">BNF</abbrev>
or <abbrev
xlink:href="http://en.wikipedia.org/wiki/Extended_Backus_Naur_form">EBNF</abbrev>
will be able to understand the grammatical rules being expressed
here.</para>
<productionset>
<title>A message of type <tag class="starttag">memo</tag></title>
<production xml:id="memo.ebnf.memo">
<lhs>Memo Message</lhs>
<rhs>'<memo>' <nonterminal
def="#memo.ebnf.sender">Sender</nonterminal> [<nonterminal
def="#memo.ebnf.recipient">Recipient</nonterminal>]+ <nonterminal
def="#memo.ebnf.subject">Subject</nonterminal> <nonterminal
def="#memo.ebnf.content">Content</nonterminal>
'</memo>'</rhs>
</production>
<production xml:id="memo.ebnf.sender">
<lhs>Sender</lhs>
<rhs>'<from>' <nonterminal def="#memo.ebnf.text"> Text
</nonterminal> '</from>'</rhs>
</production>
<production xml:id="memo.ebnf.recipient">
<lhs>Recipient</lhs>
<rhs>'<to>' <nonterminal def="#memo.ebnf.text"> Text
</nonterminal> '</to>'</rhs>
</production>
<production xml:id="memo.ebnf.subject">
<lhs>Subject</lhs>
<rhs>'<subject>' <nonterminal def="#memo.ebnf.text"> Text
</nonterminal> '</subject>'</rhs>
</production>
<production xml:id="memo.ebnf.content">
<lhs>Content</lhs>
<rhs>'<content>' <nonterminal def="#memo.ebnf.text"> Text
</nonterminal> '</content>'</rhs>
</production>
<production xml:id="memo.ebnf.text">
<lhs>Text</lhs>
<rhs>[a-zA-Z0-9]* <lineannotation>In real documents this is too
restrictive!</lineannotation></rhs>
</production>
</productionset>
<para>We may as well supply a graphical representation:</para>
<figure xml:id="extendContModelGraph">
<title>Graphical representation of the extended <code>content</code>
model.</title>
<mediaobject>
<imageobject>
<imagedata fileref="Ref/Fig/contentmixed.fig"/>
</imageobject>
</mediaobject>
</figure>
<para>In comparison to our informal description of memo documents a
schema offers an added value: The grammar is machine readable and may
thus become input to a parser which in turn gets enabled to check
whether an XML document obeys the constraints being imposed. So the
parser must be instructed to use a schema in addition to the XML
document in question. For this purpose an XML document may define a
reference to a schema:</para>
<figure xml:id="memo_external_dtd">
<title>A memo document instance holding a reference to a document
external schema.</title>
<programlisting language="none"><memo <co
xml:id="memo_external_dtd_top_element"/> xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:noNamespaceSchemaLocation="memo.xsd" <co
xml:id="memo_external_dtd_url"/>
date="2014-09-24" priority="high">
<from>M. Goik</from>
<to>B. King</to>
<to>A. June</to>
<subject>Best whishes</subject>
<content>Hi all, congratulations to your splendid party</content>
</memo></programlisting>
<calloutlist>
<callout arearefs="memo_external_dtd_top_element">
<para>The element <tag class="starttag">memo</tag> is chosen to
be the top (root) element of the document's tree. It must be
defined in our schema <filename>memo.xsd</filename>. This is
really a choice since an XML schema defines a
<emphasis>set</emphasis> of elements in
<emphasis>arbitrary</emphasis> order. There is no such rule as
<quote>define before use</quote>. So an XML schema does not tell
us which element has to appear on top of a document.</para>
<para>Suppose a given XML schema offers both <tag
class="starttag">book</tag> and <tag
class="starttag">report</tag> elements. An XML author writing a
complex document will choose <tag class="starttag">book</tag> as
top level element rather than <tag class="starttag">report</tag>
being more appropriate for a small piece of documentation.
Consequently it is an XML authors <emphasis>choice</emphasis>
which of the elements being defined in a schema shall appear as
<emphasis>the</emphasis> top level element</para>
</callout>
<callout arearefs="memo_external_dtd_url">
<para>The address of the schema's rule set. In the given example
it is just a filename but it may as well be an <link
xlink:href="http://www.w3.org/Addressing">URL</link> of type
<abbrev
xlink:href="http://en.wikipedia.org/wiki/File_Transfer_Protocol">ftp</abbrev>,
<abbrev xlink:href="http://www.w3.org/Protocols">http</abbrev>
and so on, see <xref linkend="memoDtdOnFtp"/>.</para>
</callout>
</calloutlist>
</figure>
<para>In presence of a schema parsing a document is actually a two
step process: First the parser will check the document for well
-formedness. Then the parser will read the referenced schema
<filename>memo.xsd</filename> and check the document for the
additional constraints being defined within.</para>
<para>In the current example both the schema and the XML memo document
reside as text files in a common file system folder. For general use a
schema is usually kept at a centralized location. The attribute
<varname>xsi:noNamespaceSchemaLocation</varname> value is actually a
<emphasis>U</emphasis><foreignphrase>niform</foreignphrase>
<emphasis>R</emphasis><foreignphrase>esource</foreignphrase>
<emphasis>L</emphasis><foreignphrase>ocator</foreignphrase> <link
xlink:href="http://www.w3.org/Addressing">(URL)</link>. Thus our
<filename>memo.xsd</filename> may also be supplied as a <abbrev
xlink:href="http://www.w3.org/Protocols">http</abbrev> or <abbrev
xlink:href="http://en.wikipedia.org/wiki/File_Transfer_Protocol">ftp</abbrev>
<link xlink:href="http://www.w3.org/Addressing">URL</link>:</para>
<figure xml:id="memoDtdOnFtp">
<title>A schema reference to a FTP server.</title>
<programlisting language="none"><?xml version="1.0" encoding="UTF-8"?>
<memo ... xsi:noNamespaceSchemaLocation="https://someserver.org/memo.xsd">
<from>M. Goik</from>
...
</memo></programlisting>
</figure>
<para>Some terms are helpful in the context of schemas:</para>
<variablelist>
<varlistentry>
<term>Validating / non-validating:</term>
<listitem>
<para>A non-validating parser only checks a document for well-
formedness. If it also checks XML documents for conformance to
schema it is a <emphasis>validating</emphasis> parser.</para>
</listitem>
</varlistentry>
<varlistentry>
<term>Valid / invalid documents:</term>
<listitem>
<para>An XML document referencing a schema may either be valid
or invalid depending on its conformance to the schema in
question.</para>
</listitem>
</varlistentry>
<varlistentry>
<term>Document instance:</term>
<listitem>
<para>An XML memo document may conform to the <link
linkend="figure_memo_dtd">memo schema</link>. In this case we
call it a <emphasis>document instance</emphasis> of the memo
schema.</para>
<para>This situation is quite similar as in typed programming
languages: A <link
linkend="gloss_Java"><trademark>Java</trademark></link>
<code>class</code> declaration is a blueprint for the <link
linkend="gloss_Java"><trademark>Java</trademark></link> runtime
system to construct <link
linkend="gloss_Java"><trademark>Java</trademark></link> objects
in memory. This is done by e.g. a statement<code> String name =
new String();</code>. The identifier <code>name</code> will hold
a reference to an <emphasis>instance of class String</emphasis>.
So in a <link
linkend="gloss_Java"><trademark>Java</trademark></link> runtime
environment a class declaration plays the same role as a schema
declaration in XML. See also <xref
linkend="example_memoJavaClass"/>.</para>
</listitem>
</varlistentry>
</variablelist>
<para>For further discussions it is very useful to clearly distinguish
element definitions in a schema from their
<emphasis>realizations</emphasis> in a corresponding document
instance: Our memo schema defines an element <tag
class="starttag">from</tag> to be of content <type>xs:string</type>.
According to the schema at least one <tag class="starttag">from</tag>
clause must appear in a valid (conforming) document instance . If we
were talking about HTML document instances we would prefer to talk
about a <tag class="starttag">from</tag> <emphasis>tag</emphasis>
rather than a <tag class="starttag">from</tag>
<emphasis>element</emphasis>.</para>
<para>In this document we will use the term <emphasis>element
type</emphasis> to denote an <code><xs:element ...</code>
definition in a schema. Thus we will talk about an element type <tag
class="element">subject</tag> being defined in
<filename>memo.xsd</filename>.</para>
<para>An element type being defined in a <abbrev
xlink:href="http://en.wikipedia.org/wiki/Document_Type_Declaration">schema</abbrev>
may have document instances as realizations. For example the document
instance shown in <xref linkend="memo_external_dtd"/> has two
<emphasis>nodes</emphasis> of element type <tag
class="element">to</tag>. Thus we say that the document instance
contains two <emphasis>element nodes</emphasis> of type <tag
class="element">to</tag>. We will frequently abbreviate this by saying
the instance contains to <tag class="starttag">from</tag> element
nodes. And we may even omit the term <emphasis>nodes</emphasis> and
simply talk about two <tag class="starttag">from</tag> elements. But
the careful reader should always distinguish between a single type
<code>foo</code> being defined in a <abbrev
xlink:href="http://en.wikipedia.org/wiki/Document_Type_Declaration">schema</abbrev>
and the possibly empty set of <tag class="starttag">foo</tag> nodes
appearing in valid document instances.</para>
<para><abbrev
xlink:href="http://en.wikipedia.org/wiki/Document_Type_Declaration">Schema</abbrev>'s
appear on top of well-formed XML documents:</para>
<figure xml:id="wellformedandvalid">
<title>Well-formed and valid documents</title>
<mediaobject>
<imageobject>
<imagedata fileref="Ref/Fig/wellformedandvalid.fig" scale="65"/>
</imageobject>
</mediaobject>
</figure>
<qandaset defaultlabel="qanda" xml:id="example_memoTestValid">
<title>Validation of memo document instances.</title>
<qandadiv>
<qandaentry>
<question>
<para>Copy the two files <link
xlink:href="Ref/src/Memo.1/message.xml">message.xml</link> and
<link xlink:href="Ref/src/Memo.1/memo.xsd">memo.xsd</link>
into your eclipse project. Use the Oxygen XML plug in to check
if the document is valid. Then subsequently do and undo the
following changes each time checking the document for
validity:</para>
<itemizedlist>
<listitem>
<para>Omit the <tag class="starttag">from</tag>
element.</para>
</listitem>
<listitem>
<para>Change the order of the two sub elements <tag
class="starttag">subject</tag> and <tag
class="starttag">content</tag>.</para>
</listitem>
<listitem>
<para>Erase the <varname>date</varname> attribute and its
value.</para>
</listitem>
<listitem>
<para>Erase the <varname>priority</varname> attribute and
its value.</para>
</listitem>
</itemizedlist>
<para>What do you observe?</para>
</question>
<answer>
<para>The <tag class="attribute">priority</tag> attribute is
declared as <code>optional</code> and may thus be omitted.
Erasing the <tag class="attribute">priority</tag> attribute
thus leaves the document in a valid state. The remaining three
edit actions yield an invalid document instance.</para>
</answer>
</qandaentry>
<qandaentry xml:id="example_memoJavaClass">
<question>
<label>A memo implementation sketch in Java</label>
<para>The aim of this exercise is to clarify the (abstract)
relation between XML <abbrev
xlink:href="http://en.wikipedia.org/wiki/Document_Type_Declaration">schema</abbrev>'s
and sets of <link
linkend="gloss_Java"><trademark>Java</trademark></link>
classes rather then building a running application. We want to
model the <link xlink:href="Ref/src/Memo.1/memo.xsd">memo
schema</link> as a set of <link
linkend="gloss_Java"><trademark>Java</trademark></link>
classes.</para>
</question>
<answer>
<para>The XML attributes <tag class="attribute">date</tag> and
<tag class="attribute">priority</tag> can be mapped as <link
linkend="gloss_Java"><trademark>Java</trademark></link>
attributes. The same applies for the Memo elements <tag
class="element">from</tag>, <tag class="element">subject</tag>
and <tag class="element">content</tag> which may be
implemented as simple Strings or alternatively as separate
Classes wrapping the String content. The latter method of
implementation should be preferred if the Memo schema is
expected to grow in complexity. A simple sketch reads:</para>
<programlisting language="none">import java.util.Date;
import java.util.SortedSet;
public class Memo {
private Date date;
Priority priority = Priority.standard;
private String from, subject,content;
private SortedSet<String> to;
// Accessors not yet implemented
}</programlisting>
<para>The only thing to note here is the implementation of the
<tag class="element">to</tag> element: We want to be able to
address a <emphasis>set</emphasis> of recipients. Thus we have
to disallow duplicates. Note that this is an
<emphasis>informal</emphasis> constraint not being handled by
our schema: A Memo document instance <emphasis>may</emphasis>
have duplicate content in <tag class="starttag">to</tag>
nodes. This is a weakness of <abbrev
xlink:href="http://en.wikipedia.org/wiki/Document_Type_Declaration">schema</abbrev>s:
We are unable to impose uniqueness constraints on the content
of partial sets of document nodes.</para>
<para>On the other hand our set of recipients has to be
ordered: In a XML document instance the order of <tag
class="starttag">to</tag> nodes is important and has to be
preserved in a <link
linkend="gloss_Java"><trademark>Java</trademark></link>
representation. Thus we choose an
<classname>java.util.SortedSet</classname> parametrized with
String type to fulfill both requirements.</para>
<para>Our schema defines:</para>
<programlisting language="none"><!ATTLIST memo ... priority (low|medium|high) #IMPLIED></programlisting>
<para>Starting from <link
linkend="gloss_Java"><trademark>Java</trademark></link> 1.5 we
may implement this constraint by a type safe enumeration in a
file <filename>Priority.java</filename>:</para>
<programlisting language="none">public enum Priority{low, standard, high};</programlisting>
</answer>
</qandaentry>
</qandadiv>
</qandaset>
<para>In the following chapters we will extend the memo document type
(<code><!DOCTYPE memo ... ></code>) to demonstrate various
concepts of <abbrev
xlink:href="http://en.wikipedia.org/wiki/Document_Type_Declaration">schema</abbrev>'s
and other XML related standards. In parallel a series of exercises
deals with building a schema usable to edit books. This schema gets
extended as our knowledge about XML advances. We start with an initial
exercise:</para>
<qandaset defaultlabel="qanda" xml:id="example_bookDtd">
<title>A schema for editing books</title>
<qandadiv>
<qandaentry>
<question>
<para>Write a schema describing book document instances with
the following features:</para>
<itemizedlist>
<listitem>
<para>A book shall have a title to describe the book
itself.</para>
</listitem>
<listitem>
<para>A book shall have at least one but possibly a
sequence of chapters.</para>
</listitem>
<listitem>
<para>Each chapter shall have a title and at least one
paragraph.</para>
</listitem>
<listitem>
<para>The titles and paragraphs shall consist of ordinary
text.</para>
</listitem>
</itemizedlist>
</question>
<answer>
<para>A possible schema looks like:</para>
<figure xml:id="figure_book.dtd_v1">
<title>A first schema version for book documents</title>
<programlisting language="none"><xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"
xmlns:vc="http://www.w3.org/2007/XMLSchema-versioning" elementFormDefault="qualified"
vc:minVersion="1.0" vc:maxVersion="1.1">
<xs:element name="book">
<xs:complexType>
<xs:sequence>
<xs:element ref="title"/>
<xs:element ref="chapter" minOccurs="1" maxOccurs="unbounded"/>
</xs:sequence>
</xs:complexType>
</xs:element>
<xs:element name="title" type="xs:string"/>
<xs:element name="chapter">
<xs:complexType>
<xs:sequence>
<xs:element ref="title"/>
<xs:element ref="para" minOccurs="1" maxOccurs="unbounded"/>
</xs:sequence>
</xs:complexType>
</xs:element>
<xs:element name="para" type="xs:string"/>
</xs:schema></programlisting>
</figure>
<para>We supply a valid document instance:</para>
<informalfigure xml:id="bookInitialInstance">
<programlisting language="none"><?xml version="1.0" encoding="UTF-8"?>
<book xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:noNamespaceSchemaLocation="book.xsd">
<title>Introduction to Java</title>
<chapter>
<title>Introduction</title>
<para>Java is a programming language</para>
</chapter>
<chapter>
<title>The virtual machine</title>
<para>We also call it the runtime system.</para>
</chapter>
<chapter>
<title>Annotations</title>
<para>Annotations provide a means to add meta information.</para>
<para>This is especially useful for framework authors.</para>
</chapter>
</book></programlisting>
</informalfigure>
<para>.</para>
</answer>
</qandaentry>
</qandadiv>
</qandaset>
</section>
<section xml:id="dtdVsSqlDdl">
<title>Relating <abbrev
xlink:href="http://en.wikipedia.org/wiki/Document_Type_Declaration">schema</abbrev>'s
and <acronym
xlink:href="http://en.wikipedia.org/wiki/Sql">SQL</acronym> - <abbrev
xlink:href="http://en.wikipedia.org/wiki/Data_definition_language">DDL</abbrev></title>
<para>XML <abbrev
xlink:href="http://en.wikipedia.org/wiki/Document_Type_Declaration">schema</abbrev>'s
and <acronym
xlink:href="http://en.wikipedia.org/wiki/Sql">SQL</acronym> - <abbrev
xlink:href="http://en.wikipedia.org/wiki/Data_definition_language">DDL</abbrev>
are related: They both describe data models and thus integrity
constraints. We consider a simple invoice example:</para>
<figure xml:id="invoiceIntegrity">
<title>Invoice integrity constraints</title>
<mediaobject>
<imageobject>
<imagedata fileref="Ref/Fig/invoicedata.fig" scale="65"/>
</imageobject>
</mediaobject>
</figure>
<para>A relational implementation may look like:</para>
<figure xml:id="invoiceSqlDdl">
<title>Relational implementation</title>
<mediaobject>
<imageobject>
<imagedata fileref="Ref/Fig/invoicedataimplement.fig" scale="65"/>
</imageobject>
</mediaobject>
</figure>
<qandaset defaultlabel="qanda" xml:id="qandaInvoiceSchema">
<title>An XML schema representing invoices</title>
<qandadiv>
<qandaentry>
<question>
<para>Represent the relational schema being described in <xref
linkend="invoiceSqlDdl"/> by an XML Schema and provide an
appropriate instance example.</para>
</question>
<answer>
<para>A possible schema implementation:</para>
<programlisting language="none"><xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"
xmlns:vc="http://www.w3.org/2007/XMLSchema-versioning" elementFormDefault="qualified"
vc:minVersion="1.0" vc:maxVersion="1.1">
<xs:simpleType name="money">
<xs:restriction base="xs:decimal">
<xs:fractionDigits value="2"/>
</xs:restriction>
</xs:simpleType>
<xs:element name="data">
<xs:complexType>
<xs:sequence>
<xs:element ref="customer" maxOccurs="unbounded"/>
<xs:element ref="invoice" maxOccurs="unbounded"/>
</xs:sequence>
</xs:complexType>
<xs:key name="customerId">
<xs:selector xpath="customer"/>
<xs:field xpath="@id"/>
</xs:key>
<xs:keyref refer="customerId" name="customerToInvoice">
<xs:selector xpath="invoice"/>
<xs:field xpath="@customer"></xs:field>
</xs:keyref>
</xs:element>
<xs:element name="customer">
<xs:complexType>
<xs:sequence>
<xs:element name="name" type="xs:string"/>
<xs:element name="phoneNumber" type="xs:string" minOccurs="0"/>
</xs:sequence>
<xs:attribute name="id" type="xs:int" use="required"/>
</xs:complexType>
</xs:element>
<xs:element name="invoice">
<xs:complexType>
<xs:sequence>
<xs:element name="amount" type="money"/>
<xs:element name="status">
<xs:simpleType>
<xs:restriction base="xs:token">
<xs:enumeration value="open"/>
<xs:enumeration value="due"/>
<xs:enumeration value="cleared"/>
</xs:restriction>
</xs:simpleType>
</xs:element>
</xs:sequence>
<xs:attribute name="customer" type="xs:int" use="required"/>
</xs:complexType>
</xs:element>
</xs:schema></programlisting>
<para>An example data set:</para>
<programlisting language="none"><data xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:noNamespaceSchemaLocation="invoice.xsd">
<customer id="5">
<name>Clarke Jefferson</name>
</customer>
<invoice customer="5">
<amount>33.12</amount>
<status>due</status>
</invoice>
</data></programlisting>
</answer>
</qandaentry>
</qandadiv>
</qandaset>
</section>
<section xml:id="airlineXsd">
<title>The airline example revisited</title>
<qandaset defaultlabel="qanda" xml:id="qandaAirlineXsd">
<title>Airline meta information by XML schema</title>
<qandadiv>
<qandaentry>
<question>
<para>Transform the relational schema from <xref
linkend="airlineRelationalSchema"/> into an XML schema and
supply some test data. In particular consider the following
constraints:</para>
<itemizedlist>
<listitem>
<para>Data types</para>
<itemizedlist>
<listitem>
<para><link
xlink:href="http://en.wikipedia.org/wiki/List_of_airline_codes">ICAO
airline designator</link></para>
</listitem>
<listitem>
<para><link
xlink:href="http://en.wikipedia.org/wiki/International_Civil_Aviation_Organization_airport_code">ICAO
airport code</link></para>
</listitem>
</itemizedlist>
</listitem>
<listitem>
<para>Primary / Unique key definitions</para>
</listitem>
<listitem>
<para>Foreign key definitions</para>
</listitem>
<listitem>
<para>CHECK constraint: Your XML schema will require <tag
class="starttag">xs:assert test="..." </tag> and thus XML
schema version 1.1. You may want to read about
co-occurrence constraints as being described in <link
xlink:href="http://www.ibm.com/developerworks/library/x-xml11pt2">Listing
6. Assertion on complex type - @height <
@width</link>.</para>
</listitem>
</itemizedlist>
<para>The following XML example instance may guide you towards
an <filename>airline.xsd</filename> schema:</para>
<programlisting language="none"><top xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:noNamespaceSchemaLocation="airline.xsd">
<airlines>
<airline airlineCode="DLH" id="1">
<name>Lufthansa</name>
</airline>
<airline airlineCode="AFR" id="2">
<name>Air France</name>
</airline>
</airlines>
<destinations>
<destination id="1" airportCode="EDDF">
<fullName>Frankfurt International Airport – Frankfurt am Main</fullName>
</destination>
<destination id="3" airportCode="EBCI">
<fullName>Brussels South Charleroi Airport – Charleroi</fullName>
</destination>
</destinations>
<flights>
<flight id="1" airline="2" origin="1" destination="3">
<flightNumber>LH 4234</flightNumber>
</flight>
</flights>
</top></programlisting>
<para>Hints:</para>
<itemizedlist>
<listitem>
<para>Identify all relational schema constraints from
solution of <xref linkend="airlineRelationalSchema"/> and
model them accordingly.</para>
</listitem>
<listitem>
<para>The above example does not contain any constraint
violations. In order to test your schema for completeness
tinkering with primary key, unique and referencing
attribute values may be helpful.</para>
</listitem>
</itemizedlist>
</question>
<answer>
<programlisting language="none"><xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"
xmlns:vc="http://www.w3.org/2007/XMLSchema-versioning" elementFormDefault="qualified"
vc:minVersion="1.1">
<xs:simpleType name="ICAOAirportCode">
<xs:restriction base="xs:string">
<xs:length value="4" />
<xs:pattern value="[A-Z09]+"></xs:pattern>
</xs:restriction>
</xs:simpleType>
<xs:simpleType name="ICAOAirlineCode">
<xs:restriction base="xs:string">
<xs:length value="3"/>
<xs:pattern value="[A-Z]+"></xs:pattern>
</xs:restriction>
</xs:simpleType>
<xs:element name="top">
<xs:complexType>
<xs:sequence>
<xs:element ref="airlines"/>
<xs:element ref="destinations"/>
<xs:element ref="flights"/>
</xs:sequence>
</xs:complexType>
<xs:keyref name="_FK_Flight_airline" refer="_PK_Airline_id">
<xs:selector xpath="flights/flight"/>
<xs:field xpath="@airline"/>
</xs:keyref>
<xs:keyref name="_FK_Flight_origin" refer="_PK_Destination_id">
<xs:selector xpath="flights/flight"/>
<xs:field xpath="@origin"/>
</xs:keyref>
<xs:keyref name="_FK_Flight_destination" refer="_PK_Destination_id">
<xs:selector xpath="flights/flight"/>
<xs:field xpath="@destination"/>
</xs:keyref>
</xs:element>
<xs:element name="airlines">
<xs:complexType>
<xs:sequence>
<xs:element ref="airline" minOccurs="0" maxOccurs="unbounded"/>
</xs:sequence>
</xs:complexType>
<xs:key name="_PK_Airline_id">
<xs:selector xpath="airline"/>
<xs:field xpath="@id"/>
</xs:key>
<xs:key name="_UN_Airline_name">
<xs:selector xpath="airline"/>
<xs:field xpath="name"/>
</xs:key>
<xs:key name="_UN_Airline_airlineCode">
<xs:selector xpath="airline"/>
<xs:field xpath="@airlineCode"/>
</xs:key>
</xs:element>
<xs:element name="airline">
<xs:complexType>
<xs:sequence>
<xs:element name="name" type="xs:string"/>
</xs:sequence>
<xs:attribute name="id" type="xs:int" use="required"/>
<xs:attribute name="airlineCode" type="ICAOAirlineCode" use="required"/>
</xs:complexType>
</xs:element>
<xs:element name="destinations">
<xs:complexType>
<xs:sequence>
<xs:element ref="destination" minOccurs="0" maxOccurs="unbounded"/>
</xs:sequence>
</xs:complexType>
<xs:key name="_PK_Destination_id">
<xs:selector xpath="destination"/>
<xs:field xpath="@id"/>
</xs:key>
<xs:key name="_UN_Destination_airportCode">
<xs:selector xpath="destination"/>
<xs:field xpath="@airportCode"/>
</xs:key>
</xs:element>
<xs:element name="destination">
<xs:complexType>
<xs:sequence>
<xs:element name="fullName"/>
</xs:sequence>
<xs:attribute name="id" type="xs:int"/>
<xs:attribute name="airportCode" type="ICAOAirportCode"/>
</xs:complexType>
</xs:element>
<xs:element name="flights">
<xs:complexType>
<xs:sequence>
<xs:element ref="flight" minOccurs="0" maxOccurs="unbounded"/>
</xs:sequence>
</xs:complexType>
<xs:key name="_PK_Flight_id">
<xs:selector xpath="flight"/>
<xs:field xpath="@id"/>
</xs:key>
<xs:key name="_UN_Flight_flightNumber">
<xs:selector xpath="flight"/>
<xs:field xpath="flightNumber"/>
</xs:key>
</xs:element>
<xs:element name="flight">
<xs:complexType>
<xs:sequence>
<xs:element name="flightNumber" type="xs:string"/>
</xs:sequence>
<xs:attribute name="id" type="xs:int" use="required"/>
<xs:attribute name="airline" type="xs:int" use="required"/>
<xs:attribute name="origin" type="xs:int"/>
<xs:attribute name="destination" type="xs:int"/>
<xs:assert test="not(@origin = @destination)">
<xs:annotation>
<xs:documentation>CHECK constraint _CK_Flight_origin_destination</xs:documentation>
</xs:annotation>
</xs:assert>
</xs:complexType>
</xs:element>
</xs:schema></programlisting>
</answer>
</qandaentry>
</qandadiv>
</qandaset>
</section>
<section xml:id="xmlAndJava">
<title>Relating <abbrev
xlink:href="http://en.wikipedia.org/wiki/Document_Type_Declaration">schema</abbrev>'s
and <link linkend="gloss_Java"><trademark>Java</trademark></link>
class descriptions.</title>
<para>We may also compare XML data constraints to <link
linkend="gloss_Java"><trademark>Java</trademark></link>. A <link
linkend="gloss_Java"><trademark>Java</trademark></link> class
declaration is actually a blueprint for a <trademark
xlink:href="http://www.oracle.com/technetwork/java/javase">JRE</trademark>
to instantiate compatible objects. Likewise an XML schema restricts
well-formed documents:</para>
<figure xml:id="fig_XmlAndJava">
<title>XML <abbrev
xlink:href="http://en.wikipedia.org/wiki/Document_Type_Declaration">schema</abbrev>'s
and <link linkend="gloss_Java"><trademark>Java</trademark></link>
class declarations.</title>
<mediaobject>
<imageobject>
<imagedata fileref="Ref/Fig/xmlattribandjava.fig" scale="65"/>
</imageobject>
</mediaobject>
</figure>
</section>
<section xml:id="xmlSchemaExercise">
<title>XML schema exercises</title>
<section xml:id="sectSchemaProductCatalog">
<title>A product catalog</title>
<qandaset defaultlabel="qanda" xml:id="quandaProductCatalog">
<title>Product catalog schema</title>
<qandadiv>
<qandaentry>
<question>
<para>Consider the following product catalog example:</para>
<programlisting language="none"><catalog xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:noNamespaceSchemaLocation="catalog.xsd">
<title>Outdoor products</title>
<introduction>
<para>We offer a great variety of basic stuff for mountaineering
such as ropes, harnesses and tents.</para>
<para>Our shop is proud for its large number of available
sleeping bags.</para>
</introduction>
<product id="x-223">
<title>Multi freezing bag Nightmare camper</title>
<description>
<para>You will feel comfortable till minus 20 degrees - At
least if you are a penguin or a polar bear.</para>
</description>
</product>
<product id="r-334">
<title>Rope 40m</title>
<description>
<para>Excellent for indoor climbing.</para>
</description>
</product>
</catalog></programlisting>
<para>As you may have inferred the following rules shall
apply for arbitrary catalog documents:</para>
<itemizedlist>
<listitem>
<para>Each <tag class="starttag">catalog</tag> shall
have exactly one <tag class="starttag">title</tag> and
<tag class="starttag">introduction</tag> element.</para>
</listitem>
<listitem>
<para><tag class="starttag">introduction</tag> and <tag
class="starttag">description</tag> shall have at least
one <tag class="starttag">para</tag> child.</para>
</listitem>
<listitem>
<para>Each <tag class="starttag">catalog</tag> shall
have at least one <tag
class="starttag">product</tag>.</para>
</listitem>
<listitem>
<para>Each <tag class="starttag">product</tag> shall
have exactly one <tag class="starttag">title</tag> and
at least one <tag class="starttag">para</tag> child
element.</para>
</listitem>
<listitem>
<para>The required <code>id</code> attribute shall not
contain whitespace and be unique with respect to all
<tag class="starttag">product</tag> elements.</para>
</listitem>
<listitem>
<para>The attribute price shall represent money amounts
and be optional.</para>
</listitem>
</itemizedlist>
<para>Provide a suitable <filename>catalog.xsd</filename>
schema.</para>
</question>
<answer>
<programlisting language="none"><xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"
xmlns:vc="http://www.w3.org/2007/XMLSchema-versioning" elementFormDefault="qualified"
vc:minVersion="1.0" vc:maxVersion="1.1">
<xs:simpleType name="money">
<xs:restriction base="xs:decimal">
<xs:fractionDigits value="2"/>
</xs:restriction>
</xs:simpleType>
<xs:element name="title" type="xs:string"/>
<xs:element name="para" type="xs:string"/>
<xs:element name="description" type="paraSequence"/>
<xs:element name="introduction" type="paraSequence"/>
<xs:complexType name="paraSequence">
<xs:sequence>
<xs:element ref="para" minOccurs="1" maxOccurs="unbounded"/>
</xs:sequence>
</xs:complexType>
<xs:element name="product">
<xs:complexType>
<xs:sequence>
<xs:element ref="title"/>
<xs:element ref="description"/>
</xs:sequence>
<xs:attribute name="id" type="xs:token" use="required"/>
<xs:attribute name="price" type="money" use="optional"/>
</xs:complexType>
</xs:element>
<xs:element name="catalog">
<xs:complexType>
<xs:sequence>
<xs:element ref="title"/>
<xs:element ref="introduction"/>
<xs:element ref="product" minOccurs="1" maxOccurs="unbounded"/>
</xs:sequence>
</xs:complexType>
<xs:key name="uniqueProductId">
<xs:selector xpath="product"></xs:selector>
<xs:field xpath="@id"/>
</xs:key>
</xs:element>
</xs:schema></programlisting>
</answer>
</qandaentry>
</qandadiv>
</qandaset>
</section>
<section xml:id="sectQandaBookV1">
<title>Book like documents</title>
<qandaset defaultlabel="qanda" xml:id="example_operatorprecedence">
<title>Book documents with mixed content and itemized
lists</title>
<qandadiv>
<qandaentry xml:id="example_book_v2">
<question>
<para>Extend the first version of <link
linkend="example_bookDtd">book.xsd</link> to support the
following features:</para>
<itemizedlist>
<listitem>
<para>Within a <tag class="starttag">chapter</tag> node
<tag class="starttag">para</tag> and <tag
class="starttag">itemizedlist</tag> elements in
arbitrary order shall be allowed.</para>
</listitem>
<listitem>
<para><tag class="starttag">itemizedlist</tag> nodes
shall contain at least one <tag
class="starttag">listitem</tag>.</para>
</listitem>
<listitem>
<para><tag class="starttag">listitem</tag> nodes shall
be composed of one or more para or nested list item
elements.</para>
</listitem>
<listitem>
<para>Within a <tag class="starttag">para</tag> we want
to be able to emphasize text passages.</para>
</listitem>
</itemizedlist>
<para>The following sample document instance shall be
valid:</para>
<programlisting language="none"><?xml version="1.0" encoding="UTF-8"?>
<book xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:noNamespaceSchemaLocation="catalog.xsd">
<title>Introduction to Java</title>
<chapter>
<title>Introduction</title>
<para>Java supports <emphasis>lots</emphasis> of concepts:</para>
<itemizedlist>
<listitem>
<para>Single <emphasis>implementation</emphasis> inheritance.</para>
</listitem>
<listitem>
<para>Multiple <emphasis>interface</emphasis> inheritance.</para>
<itemizedlist>
<listitem><para>Built in types</para></listitem>
<listitem><para>User defined types</para></listitem>
</itemizedlist>
</listitem>
</itemizedlist>
</chapter>
</book></programlisting>
</question>
<answer>
<para>An extended schema looks like:</para>
<figure xml:id="paraListEmphasize">
<title>Version 2 of book.xsd</title>
<programlisting language="none"><?xml version="1.0" encoding="UTF-8"?>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"
xmlns:vc="http://www.w3.org/2007/XMLSchema-versioning" elementFormDefault="qualified"
vc:minVersion="1.0" vc:maxVersion="1.1">
<xs:import namespace="http://www.w3.org/XML/1998/namespace" schemaLocation="http://www.w3.org/2001/03/xml.xsd" />
<xs:include schemaLocation="table.xsd"/>
<!-- Type definitions -->
<xs:simpleType name="languageType">
<xs:restriction base="xs:string">
<xs:enumeration value="en"/>
<xs:enumeration value="fr"/>
<xs:enumeration value="de"/>
<xs:enumeration value="it"/>
<xs:enumeration value="es"/>
</xs:restriction>
</xs:simpleType>
<!-- Elements having no inner structure -->
<xs:element name="emphasis" type="xs:string"/>
<xs:element name="title" type="xs:string"/>
<xs:element name="link">
<xs:complexType mixed="true">
<xs:attribute name="linkend" type="xs:IDREF" use="required"/>
</xs:complexType>
</xs:element>
<!-- Starting the game ... -->
<xs:element name="book">
<xs:complexType>
<xs:sequence>
<xs:element ref="title"/>
<xs:element ref="chapter" minOccurs="1" maxOccurs="unbounded"/>
</xs:sequence>
<xs:attribute name="lang" type="languageType" use="optional"/>
</xs:complexType>
</xs:element>
<xs:element name="chapter">
<xs:complexType>
<xs:sequence> <co xml:id="figure_book.dtd_v2_chapter"/>
<xs:element ref="title"/>
<xs:choice minOccurs="1" maxOccurs="unbounded">
<xs:element ref="para"/>
<xs:element ref="itemizedlist"/>
<xs:element ref="table"/>
</xs:choice>
</xs:sequence>
<xs:attribute name="id" type="xs:ID" use="optional"/>
<xs:attribute ref="xml:base"/> <!-- This allows for <xi:include ...> -->
</xs:complexType>
</xs:element>
<xs:element name="para">
<xs:complexType mixed="true"> <co
xml:id="figure_book.dtd_v2_para"/>
<xs:choice minOccurs="0" maxOccurs="unbounded">
<xs:element ref="emphasis"/>
<xs:element ref="link"/>
</xs:choice>
<xs:attribute name="id" type="xs:ID" use="optional"/>
</xs:complexType>
</xs:element>
<xs:element name="itemizedlist">
<xs:complexType>
<xs:sequence>
<xs:element ref="listitem" minOccurs="1" <co
xml:id="figure_book.dtd_v2_itemizedlist"/> maxOccurs="unbounded"/>
</xs:sequence>
<xs:attribute name="id" type="xs:ID" use="optional"/>
</xs:complexType>
</xs:element>
<xs:element name="listitem">
<xs:complexType>
<xs:choice minOccurs="1" maxOccurs="unbounded"> <co
xml:id="figure_book.dtd_v2_listitem"/>
<xs:element ref="para"/>
<xs:element ref="itemizedlist"/>
</xs:choice>
</xs:complexType>
</xs:element>
</xs:schema></programlisting>
<caption>
<para>This allows emphasized text in <tag
class="starttag">para</tag> nodes and <tag
class="starttag">itemizedlists</tag>.</para>
</caption>
</figure>
<calloutlist>
<callout arearefs="figure_book.dtd_v2_chapter">
<para>We hook into <tag class="starttag">chapter</tag>
to allow arbitrary sequences of at least one <tag
class="starttag">para</tag> or <tag
class="starttag">itemizedlist</tag> element node.</para>
</callout>
<callout arearefs="figure_book.dtd_v2_para">
<para><tag class="starttag">para</tag> nodes now allow
mixed content.</para>
</callout>
<callout arearefs="figure_book.dtd_v2_itemizedlist">
<para>An <tag class="starttag">itemizedlist</tag>
contains at least one list item.</para>
</callout>
<callout arearefs="figure_book.dtd_v2_listitem">
<para>A <tag class="starttag">listitem</tag> contains a
sequence of at least one <tag
class="starttag">para</tag> or <tag
class="starttag">itemizedlist</tag> child node. The
latter gives rise to nested lists. We find a similar
construct in HTML namely unnumbered lists defined by
<code><UL><LI>... </code>constructs.</para>
</callout>
</calloutlist>
</answer>
</qandaentry>
</qandadiv>
</qandaset>
</section>
<section xml:id="sectQandaBookLang">
<title>Allow different languages</title>
<qandaset defaultlabel="qanda" xml:id="example_book.dtd_v3">
<title>book.xsd and languages</title>
<qandadiv>
<qandaentry>
<question>
<para>We want to extend our schema from <xref
linkend="example_book_v2"/> by allowing an author to define
the language to be used within the whole or parts of the
document in question. Add an attribute <code>lang</code> to
all relevant elements like e.g. <tag class="starttag">para
lang="es"</tag>. An XML editor may use this attribute to
activate corresponding dictionaries for spell
checking.</para>
<para>The <code>lang</code> attribute shall be restricted to
the following values:</para>
<itemizedlist>
<listitem>
<para><token>en</token></para>
</listitem>
<listitem>
<para><token>fr</token></para>
</listitem>
<listitem>
<para><token>de</token></para>
</listitem>
<listitem>
<para><token>it</token></para>
</listitem>
<listitem>
<para><token>es</token></para>
</listitem>
</itemizedlist>
</question>
<answer>
<para>We define a suitable <tag
class="starttag">xs:attribute</tag> type:</para>
<programlisting language="none"><xs:attribute <emphasis
role="bold">name="lang"</emphasis>>
<xs:simpleType>
<xs:restriction base="xs:string">
<xs:enumeration value="en"/>
<xs:enumeration value="fr"/>
<xs:enumeration value="de"/>
<xs:enumeration value="it"/>
<xs:enumeration value="es"/>
</xs:restriction>
</xs:simpleType>
</xs:attribute></programlisting>
<para>Than we add this attribute to our elements like <tag
class="starttag">chapter</tag> and others:</para>
<programlisting language="none"> <xs:element name="chapter">
<xs:complexType>
<xs:sequence> ... </xs:sequence>
<xs:attribute <emphasis role="bold">ref="lang"</emphasis> use="optional"/>
...
</xs:complexType>
</xs:element></programlisting>
<para>This allows us to set a language on arbitrary
hierarchy level. But of course we may define it on top level
as well:</para>
<programlisting language="none"><?xml version="1.0" encoding="UTF-8"?>
<book ... lang="english">
<title>Introduction to Java</title>
...</programlisting>
</answer>
</qandaentry>
</qandadiv>
</qandaset>
</section>
<section xml:id="sectMixQuotes">
<title>Mixing attribute quotes</title>
<qandaset defaultlabel="qanda" xml:id="example_quotes">
<title>Single and double quotes reconsidered</title>
<qandadiv>
<qandaentry>
<question>
<para>We recall the problem of nested quotes yielding
non-well formed XML code:</para>
<programlisting language="none"><img src="bold.gif" alt="We may use "quotes" here" /></programlisting>
<para>The XML specification defines legal attribute value
definitions as:</para>
<productionset>
<title><link
xlink:href="http://www.w3.org/TR/2008/REC-xml-20081126/#d0e888">Literals</link></title>
<production xml:id="w3RecXml_NT-EntityValue">
<lhs>EntityValue</lhs>
<rhs>'"' ([^%&"] | <nonterminal
def="#w3RecXml_NT-PEReference">PEReference</nonterminal>
| <nonterminal
def="#w3RecXml_NT-Reference">Reference</nonterminal>)*
'"' | "'" ([^%&'] | <nonterminal
def="#w3RecXml_NT-PEReference">PEReference</nonterminal>
| <nonterminal
def="#w3RecXml_NT-Reference">Reference</nonterminal>)*
"'"</rhs>
</production>
<production xml:id="w3RecXml_NT-AttValue">
<lhs>AttValue</lhs>
<rhs>'"' ([^<&"] | <nonterminal
def="#w3RecXml_NT-Reference">Reference</nonterminal>)*
'"' | "'" ([^<&'] | <nonterminal
def="#w3RecXml_NT-Reference">Reference</nonterminal>)*
"'"</rhs>
</production>
<production xml:id="w3RecXml_NT-SystemLiteral">
<lhs>SystemLiteral</lhs>
<rhs>('"' [^"]* '"') | ("'" [^']* "'")</rhs>
</production>
<production xml:id="w3RecXml_NT-PubidLiteral">
<lhs>PubidLiteral</lhs>
<rhs>'"' <nonterminal
def="#w3RecXml_NT-PubidChar">PubidChar</nonterminal>*
'"' | "'" (<nonterminal
def="#w3RecXml_NT-PubidChar">PubidChar</nonterminal> -
"'")* "'"</rhs>
</production>
<production xml:id="w3RecXml_NT-PubidChar">
<lhs>PubidChar</lhs>
<rhs>#x20 | #xD | #xA | [a-zA-Z0-9]
| [-'()+,./:=?;!*#@$_%]</rhs>
</production>
</productionset>
<para>Find out how it is possible to set the attribute <tag
class="attribute">alt</tag>'s value to the string <code>We
may use "quotes" here</code>.</para>
</question>
<answer>
<para>The production rule for attribute values reads:</para>
<productionset>
<productionrecap linkend="w3RecXml_NT-AttValue"/>
</productionset>
<para>This allows us to use either of two alternatives to
delimit attribute values:</para>
<glosslist>
<glossentry>
<glossterm><tag class="starttag">img ...
alt="..."/</tag></glossterm>
<glossdef>
<para><emphasis>Validity constraint:</emphasis> do not
use <code>"</code> inside the value string.</para>
</glossdef>
</glossentry>
<glossentry>
<glossterm><tag class="starttag">img ...
alt='...'/</tag></glossterm>
<glossdef>
<para><emphasis>Validity constraint:</emphasis> do not
use <code>'</code> inside the value string.</para>
</glossdef>
</glossentry>
</glosslist>
<para>We may take advantage of the second rule:</para>
<programlisting language="none"><img src="bold.gif" alt='We may use "quotes" here' /></programlisting>
<para>Notice that according to <xref
linkend="w3RecXml_NT-AttValue"/> the delimiting quotes must
not be mixed. The following code is thus not well
formed:</para>
<programlisting language="none"><img src="bold.gif'/></programlisting>
</answer>
</qandaentry>
</qandadiv>
</qandaset>
</section>
<section xml:id="qandasetInternalRef">
<title>Internal references</title>
<qandaset defaultlabel="qanda" xml:id="example_book.dtd_v5">
<title>book.xsd and internal references</title>
<qandadiv>
<qandaentry>
<question>
<para>We want to extend <xref
linkend="example_book.dtd_v3"/> schema to allow for document
internal references by:</para>
<itemizedlist>
<listitem>
<para>Allowing each <tag class="starttag">chapter</tag>,
<tag class="starttag">para</tag> and <tag
class="starttag">itemizedlist</tag> to become reference
targets.</para>
</listitem>
<listitem>
<para>Extending the element <tag
class="element">para</tag>'s mixed content model by a
new element <tag class="element">link</tag> with an
attribute <tag class="attribute">linkend</tag> being a
reference to a target.</para>
</listitem>
</itemizedlist>
</question>
<answer>
<para>We extend our schema:</para>
<programlisting language="none"><?xml version="1.0" encoding="UTF-8"?>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"
xmlns:vc="http://www.w3.org/2007/XMLSchema-versioning" elementFormDefault="qualified"
vc:minVersion="1.0" vc:maxVersion="1.1">
<xs:import namespace="http://www.w3.org/XML/1998/namespace" schemaLocation="http://www.w3.org/2001/03/xml.xsd" />
<xs:include schemaLocation="table.xsd"/>
<!-- Type definitions -->
<xs:attribute name="lang">
<xs:simpleType>
<xs:restriction base="xs:string">
<xs:enumeration value="en"/>
<xs:enumeration value="fr"/>
<xs:enumeration value="de"/>
<xs:enumeration value="it"/>
<xs:enumeration value="es"/>
</xs:restriction>
</xs:simpleType>
</xs:attribute>
<!-- Elements having no inner structure -->
<xs:element name="emphasis" type="xs:string"/>
<xs:element name="title" type="xs:string"/>
<xs:element name="link">
<xs:complexType mixed="true"> <co
xml:id="progamlisting_book_v5_link"/>
<xs:attribute name="linkend" <co
xml:id="progamlisting_book_v5_link_linkend"/> type="xs:IDREF" use="required"/>
</xs:complexType>
</xs:element>
<!-- Starting the game ... -->
<xs:element name="book">
<xs:complexType>
<xs:sequence>
<xs:element ref="title"/>
<xs:element ref="chapter" minOccurs="1" maxOccurs="unbounded"/>
</xs:sequence>
<xs:attribute ref="lang" use="optional"/>
</xs:complexType>
</xs:element>
<xs:element name="chapter">
<xs:complexType>
<xs:sequence>
<xs:element ref="title"/>
<xs:choice minOccurs="1" maxOccurs="unbounded">
<xs:element ref="para"/>
<xs:element ref="itemizedlist"/>
<xs:element ref="table"/>
</xs:choice>
</xs:sequence>
<xs:attribute ref="lang" use="optional"/>
<xs:attribute name="id" <co
xml:id="progamlisting_book_v5_chapter_id"/> type="xs:ID" use="optional"/>
<xs:attribute ref="xml:base"/> <!-- This allows for <xi:include ...> -->
</xs:complexType>
</xs:element>
<xs:element name="para">
<xs:complexType mixed="true"> <co
xml:id="progamlisting_book_v5_mixed_link"/>
<xs:choice minOccurs="0" maxOccurs="unbounded">
<xs:element ref="emphasis"/>
<xs:element ref="link"/>
</xs:choice>
<xs:attribute ref="lang" use="optional"/>
<xs:attribute name="id" <co
xml:id="progamlisting_book_v5_para_id"/> type="xs:ID" use="optional"/>
</xs:complexType>
</xs:element>
<xs:element name="itemizedlist">
<xs:complexType>
<xs:sequence>
<xs:element ref="listitem" minOccurs="1" maxOccurs="unbounded"/>
</xs:sequence>
<xs:attribute ref="lang" use="optional"/>
<xs:attribute name="id" type="xs:ID" use="optional"/>
</xs:complexType>
</xs:element>
<xs:element name="listitem">
<xs:complexType>
<xs:choice minOccurs="1" maxOccurs="unbounded">
<xs:element ref="para"/>
<xs:element ref="itemizedlist"/>
</xs:choice>
<xs:attribute ref="lang" use="optional"/>
</xs:complexType>
</xs:element>
</xs:schema></programlisting>
<calloutlist>
<callout arearefs="progamlisting_book_v5_chapter_id">
<para>Defining an attribute <tag
class="attribute">id</tag> of type <code>ID</code> for
the elements <tag class="element">chapter</tag>, <tag
class="element">para</tag> and <tag
class="element">itemizedList</tag>. This enables an
author to define internal reference targets.</para>
</callout>
<callout arearefs="progamlisting_book_v5_mixed_link">
<para>A link is part of the element <tag
class="element">para</tag>'s mixed content model. Thus
an author may define internal references along with
ordinary text.</para>
</callout>
<callout arearefs="progamlisting_book_v5_link">
<para>Like in HTML a link may contain text. If converted
to HTML the formatting expectation is a hypertext
link.</para>
</callout>
<callout arearefs="progamlisting_book_v5_link_linkend">
<para>The attribute <tag class="attribute">linkend</tag>
holds the reference to an internal target being either a
<tag class="element">chapter</tag>, a <tag
class="element">para</tag> or an <tag
class="element">itemizedList</tag>.</para>
</callout>
</calloutlist>
</answer>
</qandaentry>
</qandadiv>
</qandaset>
</section>
</section>
</section>
</chapter>
<chapter xml:id="xsl">
<title>The Extensible Stylesheet Language XSL</title>
<para>XSL is a <link xlink:href="http://www.w3.org/Style/XSL">W3C
standard</link> which defines a language to transform XML documents into
the following output formats:</para>
<itemizedlist>
<listitem>
<para>Ordinary text e.g in <link
xlink:href="http://unicode.org">Unicode</link> encoding.</para>
</listitem>
<listitem>
<para>XML.</para>
</listitem>
<listitem>
<para>HTML</para>
</listitem>
<listitem>
<para>XHTML</para>
</listitem>
</itemizedlist>
<para>Transforming a source XML document into a target XML document may be
required if:</para>
<itemizedlist>
<listitem>
<para>The target document expresses similar semantics but uses a
different XML dialect i.e. different tag names.</para>
</listitem>
<listitem>
<para>The target document is only a view on the source document. We
may for example extract the chapter names from a <tag
class="starttag">book</tag> document to create a table of
contents.</para>
</listitem>
</itemizedlist>
<section xml:id="xsl_helloworld">
<title>A <quote>Hello, world</quote> <abbrev
xlink:href="http://www.w3.org/Style/XSL">XSL</abbrev> example</title>
<para>We start from an extended version of our
<filename>memo.xsd</filename>:</para>
<programlisting language="none"><xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"
xmlns:vc="http://www.w3.org/2007/XMLSchema-versioning" elementFormDefault="qualified"
vc:minVersion="1.0" vc:maxVersion="1.1">
<xs:element name="memo">
<xs:complexType>
<xs:sequence>
<xs:element name="from" type="Person"/>
<xs:element name="to" type="Person" minOccurs="1" maxOccurs="unbounded"/>
<xs:element name="subject" type="xs:string"/>
<xs:element ref="content"/>
</xs:sequence>
<xs:attribute name="date" type="xs:date" use="required"/>
<xs:attribute name="priority" type="Priority" use="optional"/>
</xs:complexType>
</xs:element>
<xs:complexType name="Person">
<xs:simpleContent>
<xs:extension base="xs:string">
<xs:attribute name="id" type="xs:ID"/>
</xs:extension>
</xs:simpleContent>
</xs:complexType>
<xs:element name="content">
<xs:complexType>
<xs:sequence>
<xs:element ref="para" minOccurs="1" maxOccurs="unbounded"/>
</xs:sequence>
</xs:complexType>
</xs:element>
<xs:element name="para">
<xs:complexType mixed="true">
<xs:sequence>
<xs:element ref="link" minOccurs="0"/>
</xs:sequence>
</xs:complexType>
</xs:element>
<xs:element name="link">
<xs:complexType mixed="true">
<xs:simpleContent>
<xs:extension base="xs:string">
<xs:attribute name="linkend" type="xs:IDREF"/>
</xs:extension>
</xs:simpleContent>
</xs:complexType>
</xs:element>
<xs:simpleType name="Priority">
<xs:restriction base="xs:string">
<xs:enumeration value="low"/>
<xs:enumeration value="medium"/>
<xs:enumeration value="high"/>
</xs:restriction>
</xs:simpleType>
</xs:schema></programlisting>
<para>This schema allows a memo's document content to be structured into
paragraphs. A paragraph may contain links either to the sender or to a
recipient.</para>
<figure xml:id="figure_memoref_instance">
<title>A memo document instance with an internal reference.</title>
<programlisting language="none"><memo xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:noNamespaceSchemaLocation="memo.xsd"
date="2014-09-24" priority="high" >
<from <emphasis role="bold">id="goik"</emphasis>>Martin Goik</from>
<to>Adam Hacker</to>
<to id="eve">Eve Intruder</to>
<subject>Firewall problems</subject>
<content>
<para>Thanks for your excellent work.</para>
<para>Our firewall is definitely broken! This bug has been reported by
the <link <emphasis role="bold">linkend="goik"</emphasis>>sender</link>.</para>
</content>
</memo></programlisting>
</figure>
<para>We want to extract the sender's name from an arbitrary <tag
class="element">memo</tag> document instance. Using <abbrev
xlink:href="http://www.w3.org/Style/XSL">XSL</abbrev> this task can be
accomplished by a script <filename>memo2sender.xsl</filename>:</para>
<programlisting language="none"><?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
version="2.0">
<xsl:output method="text"/>
<xsl:template match="/memo">
<xsl:value-of select="from"/>
</xsl:template>
</xsl:stylesheet></programlisting>
<para>Before closer examining this code we first show its effect. We
need a piece of software called a <abbrev
xlink:href="http://www.w3.org/Style/XSL">XSL</abbrev> processor. It
reads both a <tag>memo</tag> document instance and a style sheet and
produces the following output:</para>
<programlisting language="none"><computeroutput>[goik@mupter Memoref]$ xml2xml message.xml memo2sender.xsl
Martin Goik</computeroutput></programlisting>
<para>The result is the sender's name <computeroutput>Martin
Goik</computeroutput>. We may sketch the transformation
principle:</para>
<figure xml:id="figure_xsl_principle">
<title>An <abbrev
xlink:href="http://www.w3.org/Style/XSL">XSL</abbrev> processor
transforming a XML document into a result using a stylesheet</title>
<mediaobject>
<imageobject>
<imagedata fileref="Ref/Fig/xslconvert.fig"/>
</imageobject>
</mediaobject>
</figure>
<para>The executable <filename>xml2xml</filename> defined at the MI
department is actually a script wrapping the <productname
xlink:href="http://saxon.sourceforge.net">Saxon XSLT
processor</productname>. We may also use the Eclipse/Oxygen plugin
replacing the shell command by a GUI <link
xlink:href="http://www.oxygenxml.com/doc/ug-editorEclipse/#topics/defining-new-transformation-scenario.html">as
being described in the corresponding documentation</link>. Next we
closer examine the <abbrev
xlink:href="http://www.w3.org/Style/XSL">XSL</abbrev> example
code:</para>
<programlisting language="none"><xsl:stylesheet <co
xml:id="programlisting_helloxsl_stylesheet"/> xmlns:xsl <co
xml:id="programlisting_helloxsl_namespace_abbv"/> ="http://www.w3.org/1999/XSL/Transform"
version="2.0" <co xml:id="programlisting_helloxsl_xsl_version"/> >
<xsl:output method="text" <co
xml:id="programlisting_helloxsl_method_text"/>/>
<xsl:template <co xml:id="programlisting_helloxsl_template"/> match <co
xml:id="programlisting_helloxsl_match"/> ="/memo">
<xsl:value-of <co xml:id="programlisting_helloxsl_value-of"/> select <co
xml:base="" xml:id="programlisting_helloxsl_valueof_select_att"/> ="from" />
</xsl:template>
</xsl:stylesheet></programlisting>
<calloutlist>
<callout arearefs="programlisting_helloxsl_stylesheet">
<para>The element stylesheet belongs the the namespace
<code>http://www.w3.org/1999/XSL/Transform</code>. This namespace is
<emphasis>represented</emphasis> by the literal
<literal>xsl</literal>. As an alternative we might also use <tag
class="starttag">stylesheet
xmlns="http://www.w3.org/1999/XSL/Transform"</tag> instead of <tag
class="starttag">xsl:stylesheet ...</tag>. The value of the
namespace itself gets defined next.</para>
</callout>
<callout arearefs="programlisting_helloxsl_namespace_abbv">
<para>The keyword <code>xmlns</code> is reserved by the <link
xlink:href="http://www.w3.org/TR/REC-xml-names/">Namespaces in
XML</link> specification. In <quote>pure</quote> XML the whole term
<code>xmlns:xsl</code> would simply define an attribute. In presence
of a namespace aware XML parser however the literal
<literal>xsl</literal> represents the attribute value <tag
class="attvalue">http://www.w3.org/1999/XSL/Transform</tag>. This
value <emphasis>must not</emphasis> be changed! Otherwise a XSL
converter will fail since it cannot distinguish processing
instructions from other XML elements. An element <tag
class="starttag">stylesheet</tag> belonging to a different namespace
<code>http//someserver.org/SomeNamespace</code> may have to be
generated.</para>
</callout>
<callout arearefs="programlisting_helloxsl_xsl_version">
<para>The <link xlink:href="http://www.w3.org/TR/xslt20">XSL
standard</link> is still evolving. The version number identifies the
conformance level for the subsequent code.</para>
</callout>
<callout arearefs="programlisting_helloxsl_method_text">
<para>The <tag class="attribute">method</tag> attribute in the <link
xlink:href="http://www.w3.org/TR/xslt20/#element-output"><xsl:output></link>
element specifies the type of output to be generated. Depending on
this type we may also define indentation depths and/or encoding.
Allowed <tag class="attvalue">method</tag> values are:</para>
<glosslist>
<glossentry>
<glossterm>text</glossterm>
<glossdef>
<para>Ordinary text.</para>
</glossdef>
</glossentry>
<glossentry>
<glossterm>html</glossterm>
<glossdef>
<para><link
xlink:href="http://www.w3.org/TR/html4">HTML</link>
markup.</para>
</glossdef>
</glossentry>
<glossentry>
<glossterm>xhtml</glossterm>
<glossdef>
<para><link
xlink:href="http://www.w3.org/TR/xhtml1">Xhtml</link> markup
differing from the former by e.g. the closing
<quote>/></quote> in <tag><img
src="..."/></tag>.</para>
</glossdef>
</glossentry>
<glossentry>
<glossterm>xml</glossterm>
<glossdef>
<para>XML code. This is most commonly used to create views on
or different dialects of a XML document instance.</para>
</glossdef>
</glossentry>
</glosslist>
</callout>
<callout arearefs="programlisting_helloxsl_template">
<para>A <tag class="starttag">xsl:template</tag> defines the output
that will be created for document nodes being defined by a
selector.</para>
</callout>
<callout arearefs="programlisting_helloxsl_match">
<para>The attribute <tag class="attribute">match</tag> tells us for
which nodes of a document instance the given <tag
class="starttag">xsl:template</tag> is appropriate. In the given
example the value <code>/memo</code> tells us that the template is
only responsible for <tag class="element">memo</tag> nodes appearing
at top level i.e. being the root element of the document
instance.</para>
</callout>
<callout arch=""
arearefs="programlisting_helloxsl_value-of programlisting_helloxsl_valueof_select_att">
<para>A <tag class="element">value-of</tag> element writes content
to the <abbrev xlink:href="http://www.w3.org/Style/XSL">XSL</abbrev>
process' output. In this example the <code>#PCDATA</code> content
from the element <tag class="element">from</tag> will be written to
the output.</para>
</callout>
</calloutlist>
</section>
<section xml:id="xpath">
<title><link xlink:href="http://www.w3.org/TR/xpath">XPath</link> and
node sets</title>
<para>The <acronym
xlink:href="http://www.w3.org/TR/xpath">XPath</acronym> standard allows
us to retrieve node sets from XML documents by predicate based queries.
Thus its role may be compared to <acronym
xlink:href="http://en.wikipedia.org/wiki/Sql">SQL</acronym>
<code>SELECT</code> ... <code>FROM</code> ...<code>WHERE</code> queries.
Some simple examples:</para>
<figure xml:id="fig_Xpath">
<title>Simple <acronym
xlink:href="http://www.w3.org/TR/xpath">XPath</acronym>
queries</title>
<mediaobject>
<imageobject>
<imagedata fileref="Ref/Fig/xpath.fig" scale="65"/>
</imageobject>
</mediaobject>
</figure>
<para>We are now interested in a list of all recipients being defined in
a <tag class="element">memo</tag> element. We introduce the element <tag
class="element">xsl:for-each</tag> which iterates over a result set of
nodes:</para>
<figure xml:id="programlisting_tolist_xpath">
<title>Iterating over the list of recipient nodes.</title>
<programlisting language="none"><?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
version="2.0">
<xsl:output method="text"/>
<xsl:template match="/" <co xml:id="programlisting_tolist_match_root"/>>
<xsl:for-each select="memo/to" <co
xml:id="programlisting_tolist_xpath_memo_to"/> >
<xsl:value-of select="." <co xml:id="programlisting_tolist_value_of"/> />
<xsl:text>,</xsl:text> <co
xml:id="programlisting_tolist_xsl_text"/>
</xsl:for-each>
</xsl:template>
</xsl:stylesheet></programlisting>
</figure>
<calloutlist>
<callout arearefs="programlisting_tolist_match_root">
<para>This template matches the XML document instance,
<emphasis>not</emphasis> the visible <tag
class="element"><memo></tag> node.</para>
</callout>
<callout arearefs="programlisting_tolist_xpath_memo_to">
<para>The <link xlink:href="http://www.w3.org/TR/xpath">XPath</link>
expression <tag class="attvalue">memo/to</tag> gets evaluated
starting from the invisible top level document node being the
context node. For the given document instance this will define a
result set containing both <tag class="element"><to></tag>
recipient nodes, see <xref
linkend="figure_memo_xpath_memo_to"/>.</para>
</callout>
<callout arearefs="programlisting_tolist_value_of">
<para>The dot <quote>.</quote> represents the <code>#PCDATA</code>
content of the current <tag class="element">to</tag> element.</para>
</callout>
<callout arearefs="programlisting_tolist_xsl_text">
<para>A comma is appended. This is not quite correct since it should
be absent for the last element.</para>
</callout>
</calloutlist>
<figure xml:id="figure_recipientlist_trailing_comma">
<title>A list of recipients.</title>
<para>The <abbrev
xlink:href="http://www.w3.org/Style/XSL">XSL</abbrev> presented before
yields:</para>
<programlisting language="none"><computeroutput>Adam Hacker,Eve Intruder</computeroutput><emphasis
role="bold">,</emphasis></programlisting>
</figure>
<para>Right now we do not bother about the trailing <quote>,</quote>
after the last recipient. The surrounding
<code><xsl:text></code>,<code></xsl:text></code> elements
<emphasis>may</emphasis> be omitted. We encourage the reader to leave
them in place since they increase readability when a template's body
gets more complex. The element <tag class="starttag">xsl:text</tag> is
used to append static text to the output. This way we append a separator
after each recipient. We now discuss the role of the two attributes <tag
class="attribute">match="/"</tag> and <tag
class="attribute">select=memo/to</tag>. Both are examples of so called
<link xlink:href="http://www.w3.org/TR/xpath">XPath</link> expressions.
They allow to define <emphasis>node sets</emphasis> being subsets from
the set of all nodes from a given document instance.</para>
<para>Conceptually <link
xlink:href="http://www.w3.org/TR/xpath">XPath</link> expressions may be
compared to the <acronym
xlink:href="http://en.wikipedia.org/wiki/Sql">SQL</acronym> language the
latter allowing the retrieval of data<emphasis>sets</emphasis> from a
relational database. We illustrate the current example by a
figure:</para>
<figure xml:id="figure_memo_xpath_memo_to">
<title>Selecting node sets from <tag class="element">memo</tag>
document instances</title>
<mediaobject>
<imageobject>
<imagedata fileref="Ref/Fig/memoxpath.fig"/>
</imageobject>
</mediaobject>
</figure>
<para>This figure needs some explanation. We observe an additional node
<quote>above</quote> <tag class="starttag">memo</tag> being represented
as <quote>filled</quote>. This node represents the document instance as
a whole and has got <tag>memo</tag> as its only child. We will
rediscover this additional root node when we discuss the <abbrev
xlink:href="http://www.w3.org/TR/2004/REC-DOM-Level-3-Core-20040407">DOM</abbrev>
application programming interface.</para>
<para>As already mentioned the expression <code>memo/to</code> evaluates
to a <emphasis>set</emphasis> of nodes. In our example this set consists
of two nodes of type <tag class="starttag">to</tag> each of them
representing a recipient of the memo. We observe a subtle difference
between the two <abbrev
xlink:href="http://www.w3.org/TR/xpath">XPath</abbrev>
expressions:</para>
<glosslist>
<glossentry>
<glossterm><code>match="/"</code></glossterm>
<glossdef>
<para>The expression starts and actually consists of the string
<quote>/</quote>. Thus it can be called an
<emphasis>absolute</emphasis> <abbrev
xlink:href="http://www.w3.org/TR/xpath">XPath</abbrev> expression.
Like a file specification <filename>C:\dos\myprog.exe</filename>
it starts on top level and needs no further context information to
get evaluated.</para>
<para>A <abbrev
xlink:href="http://www.w3.org/Style/XSL">XSL</abbrev> style sheet
<emphasis>must</emphasis> have an <link
xlink:href="http://www.w3.org/TR/xslt20/#initiating">initial
context node</link> to start the transformation. This is achieved
by providing exactly one <tag class="starttag">xsl:template</tag>
with an absolute <abbrev
xlink:href="http://www.w3.org/TR/xpath">XPath</abbrev> value for
its <tag class="attribute">match</tag> attribute like <tag
class="attvalue">/memo</tag>.<emphasis/></para>
</glossdef>
</glossentry>
<glossentry>
<glossterm><code>select="memo/to"</code></glossterm>
<glossdef>
<para>This expression can be compared to a
<emphasis>relative</emphasis> file path specification like e.g.
<filename>../images/hdm.gif</filename>. We need to add the base
(context) directory in order for a relative file specification to
become meaningful. If the base directory is
<filename>/home/goik/xml</filename> than this
<emphasis>relative</emphasis> file specification will address the
file <filename>/home/goik/images/hdm.gif</filename>.</para>
<para>Likewise we have to define a <emphasis>context</emphasis>
node if we want to evaluate a relative <abbrev
xlink:href="http://www.w3.org/TR/xpath">XPath</abbrev> expression.
In our example this is the root node. The XSL specification
introduces the term <link
xlink:href="http://www.w3.org/TR/xslt20/#context">evaluation
context</link> for this purpose.</para>
</glossdef>
</glossentry>
</glosslist>
<para>In order to explain relative <abbrev
xlink:href="http://www.w3.org/TR/xpath">XPath</abbrev> expressions we
consider <code>content/para</code> starting from the (unique!) <tag
class="element">memo</tag> node:</para>
<figure xml:id="memoXpathPara">
<title>The node set represented by <code>content/para</code> starting
at the context node <tag class="starttag">memo</tag>.</title>
<mediaobject>
<imageobject>
<imagedata fileref="Ref/Fig/memorelativexpath.fig"/>
</imageobject>
<caption>
<para>The dashed lines represent the relative <abbrev
xlink:href="http://www.w3.org/TR/xpath">XPath</abbrev> expressions
starting from the context node to each of the nodes in the result
set.</para>
</caption>
</mediaobject>
</figure>
</section>
<section xml:id="xsl_important_elements">
<title>Some important <abbrev
xlink:href="http://www.w3.org/Style/XSL">XSL</abbrev> elements</title>
<section xml:id="xsl_if">
<title><tag class="starttag">xsl:if</tag></title>
<para>Sometimes we need conditional processing rules. We might want
create a list of sender and recipients with a defined value for the
attribute <tag class="attribute">id</tag>. In the <link
linkend="figure_memoref_instance">given example</link> this is only
valid for the (unique) sender and the recipient <code><to
id="eve">Eve Intruder</to></code>. We assume this set of
persons shall be inserted into a relational database table
<code>Customer</code> consisting of two <code>NOT NULL</code> columns
<code>id</code> an <code>name</code>. Thus both attributes
<emphasis>must</emphasis> be specified and we must exclude <tag
class="starttag">from</tag> or <tag class="starttag">to</tag> nodes
with undefined <tag class="attribute">id</tag> attributes:</para>
<figure xml:id="programlisting_memo_export_sql">
<title>Exporting SQL statements.</title>
<programlisting language="none">...
<xsl:variable name="newline" <co xml:id="programlisting_xsl_if_definevar"/>> <!-- A newline \n -->
<xsl:text>
</xsl:text>
</xsl:variable>
<xsl:template match="/memo">
<xsl:for-each select="from|to" <co xml:id="programlisting_xsl_if_foreach"/>>
<xsl:if <emphasis role="bold">test="@id"</emphasis> <co
xml:id="programlisting_xsl_if_test"/>>
<xsl:text>INSERT INTO Customer (id, name) VALUES ('</xsl:text>
<xsl:value-of select="@id" <co
xml:id="programlisting_xsl_if_select_idattrib"/>/>
<xsl:text>', '</xsl:text>
<xsl:value-of select="." <co
xml:id="programlisting_xsl_if_selectcontent"/>/>
<xsl:text>')</xsl:text>
<xsl:value-of select="$newline" <co
xml:id="programlisting_xsl_if_usevar"/>/>
</xsl:if>
</xsl:for-each>
</xsl:template></programlisting>
<caption>
<para>We want to export data from XML documents to a database
server. For this purpose INSERT statements are being crafted from
a XML document containing relevant data.</para>
</caption>
</figure>
<calloutlist>
<callout arearefs="programlisting_xsl_if_definevar">
<para>Define a file local variable <code>newline</code>. Dealing
with text output frequently requires the insertion of newlines.
Due to the syntax of the <tag class="element">xsl:text</tag>
elements this tends to clutter the code.</para>
</callout>
<callout arearefs="programlisting_xsl_if_foreach">
<para>Iterate over the set of the sender node and all recipient
nodes.</para>
</callout>
<callout arearefs="programlisting_xsl_if_test">
<para>The attribute value of <tag class="attribute">test</tag>
will be <link
xlink:href="http://www.w3.org/TR/xslt20/#xsl-if">evaluated</link>
as a boolean. In this example it evaluates to <code>true</code>
iff the attribute <tag class="attribute">id</tag> is defined for
the context node. Since we are inside the <tag
class="element">xsl:for-each</tag> block all context nodes are
either of type <tag class="starttag">from</tag> or <tag
class="starttag">to</tag> and thus <emphasis>may</emphasis> have
an <tag class="attribute">id</tag> attribute.</para>
</callout>
<callout arearefs="programlisting_xsl_if_select_idattrib">
<para>The <tag class="attribute">id</tag> attributes value is
copied to the output. The <quote>@</quote> character in
<code>select="@id"</code> tells the <abbrev
xlink:href="http://www.w3.org/Style/XSL">XSL</abbrev> processor to
read the value of an <emphasis>attribute</emphasis> with name <tag
class="attribute">id</tag> rather then the content of a nested
sub<emphasis>element</emphasis> like in <code><to
id="foo"><id>I am
nested!</id></to></code>.</para>
</callout>
<callout arearefs="programlisting_xsl_if_selectcontent">
<para>As stated earlier the dot <quote>.</quote> denotes the
current context element. In this example simply the
<code>#PCDATA</code> content is copied to the output.</para>
</callout>
<callout arearefs="programlisting_xsl_if_usevar">
<para>The <quote>$</quote> sign in front of <code>newline</code>
tells the <abbrev
xlink:href="http://www.w3.org/Style/XSL">XSL</abbrev> processor to
access the variable <varname>newline</varname> previously defined
in <coref linkend="programlisting_xsl_if_definevar"/> rather then
interpreting it as the name of a sub element or an
attribute.</para>
</callout>
</calloutlist>
<para>As expected the recipient entry <quote>Adam Hacker</quote> does
not appear due to the fact that no <tag class="attribute">id</tag>
attribute is defined in its <tag class="starttag">to</tag>
element:</para>
<programlisting language="none"><computeroutput>INSERT INTO Customer (id, name) VALUES ('goik', 'Martin Goik')
INSERT INTO Customer (id, name) VALUES ('eve', 'Eve intruder')</computeroutput></programlisting>
<qandaset defaultlabel="qanda" xml:id="example_position_last">
<title>The XPath functions position() and last()</title>
<qandadiv>
<qandaentry>
<question>
<para>We return to our recipient list in <xref
linkend="figure_recipientlist_trailing_comma"/>. We are
interested in a list of recipients avoiding the trailing
comma:</para>
<programlisting language="none"><computeroutput>Adam Hacker,Eve Intruder</computeroutput></programlisting>
<para>We may use a <tag class="element">xsl:if</tag> to insert
a comma for all but the very last recipient node. This can be
achieved by using the <abbrev
xlink:href="http://www.w3.org/Style/XSL">XSL</abbrev>
functions <link
xlink:href="http://www.w3.org/TR/xpath#function-position">position()</link>
and <link
xlink:href="http://www.w3.org/TR/xpath#function-last">last()</link>.
Hint: The arithmetic operator <quote><</quote> may be used
in <abbrev
xlink:href="http://www.w3.org/Style/XSL">XSL</abbrev> to
compare two integer numbers. However it must be escaped as
<code>&lt;</code> in order to be XML compatible.</para>
</question>
<answer>
<para>We have to exclude the comma for the last node of the
recipient list. If we have e.g. 10 recipients the function
<code>position()</code> will return values integer values
starting at 1 and ending with 10. So for the last node the
comparison <code>10 < 10</code> will evaluate to
false:</para>
<programlisting language="none"><xsl:for-each select="memo/to">
<xsl:value-of select="."/>
<xsl:if test="position() &lt; last()">
<xsl:text>,</xsl:text>
</xsl:if>
</xsl:for-each></programlisting>
</answer>
</qandaentry>
<qandaentry xml:id="example_avoid_xsl_if">
<question>
<label>Avoiding xsl:if</label>
<para>In <xref linkend="programlisting_memo_export_sql"/> we
used the <abbrev
xlink:href="http://www.w3.org/TR/xpath">XPath</abbrev> value
<quote>from|to</quote> to select the desired sender and
recipient nodes. Inside the <tag
class="element">xsl:for-each</tag> block we permitted only
those nodes which have an <tag class="attribute">id</tag>
attribute. These two steps may be combined into a single
<abbrev xlink:href="http://www.w3.org/TR/xpath">XPath</abbrev>
expression obsoleting the <tag
class="element">xsl:if</tag>.</para>
</question>
<answer>
<para>We simply need a modified <abbrev
xlink:href="http://www.w3.org/TR/xpath">XPath</abbrev> in the
<tag class="element">for-each</tag>:</para>
<programlisting language="none"><xsl:for-each select="<emphasis
role="bold">from[@id]|to[@id]</emphasis>">
<xsl:text>INSERT INTO Customer (id, name) VALUES ('</xsl:text>
<xsl:value-of select="@id"/>
<xsl:text>', '</xsl:text>
<xsl:value-of select="."/>
<xsl:text>')</xsl:text>
<xsl:value-of select="$newline"/>
</xsl:for-each></programlisting>
</answer>
</qandaentry>
</qandadiv>
</qandaset>
</section>
<section xml:id="xsl_apply_templates">
<title><tag class="starttag">xsl:apply-templates</tag></title>
<para>We already used <tag class="element">xsl:for-each</tag> to
iterate over a list of element nodes. <abbrev
xlink:href="http://www.w3.org/Style/XSL">XSL</abbrev> offers a
different possibility for this purpose. The idea is to define the
formatting rules at a centralized location. So the solution to <xref
linkend="example_position_last"/> in an equivalent way:</para>
<programlisting language="none"><xsl:template match="/">
<xsl:apply-templates select="memo/to" <co
xml:id="programlisting_apply_templates_apply"/>/>
</xsl:template>
<xsl:template match="to" <co xml:id="programlisting_apply_templates_match"/>>
<xsl:value-of select="."/>
<xsl:if test="<emphasis role="bold">position()</emphasis> &lt; <emphasis
role="bold">last()</emphasis>">
<xsl:text>,</xsl:text>
</xsl:if>
</xsl:template></programlisting>
<calloutlist>
<callout arearefs="programlisting_apply_templates_apply">
<para>Definition of the recipient node list. Each element of this
list shall be processed further.</para>
</callout>
<callout arearefs="programlisting_apply_templates_match">
<para>This template <emphasis>may</emphasis> be used by a XSL
processor to format nodes of type <tag class="starttag">to</tag>.
Since the processor is asked to do exactly this in <xref
linkend="programlisting_apply_templates_apply"/> the current
template will <emphasis>really</emphasis> be used in this
example.</para>
</callout>
</calloutlist>
<para>The procedure outlined above may have the following
advantages:</para>
<itemizedlist>
<listitem>
<para>Some elements may appear at different places of a given
document hierarchy. For example a <tag
class="starttag">title</tag> element is likely to appear as a
child of chapters, sections, tables figures and so on. It may be
sufficient to define a single template with a
<code>match="title"</code> attribute which contains all rules
being required.</para>
</listitem>
<listitem>
<para>Sometimes the body of a <tag
class="starttag">xsl:for-each</tag> ... <tag
class="endtag">xsl:for-each</tag> spans multiple screens thus
limiting code readability. Factoring out the body into a template
may avoid this obstacle.</para>
</listitem>
</itemizedlist>
<para>This method is well known from programming languages: If the
code inside a loop is needed multiple times or reaches a painful line
count <emphasis>good</emphasis> programmers tend to define a separate
method. For example:</para>
<programlisting language="none">for (int i = 0; i < 10; i++){
if (a[i] < b[i]){
max[i] = b;
} else {
max[i] = a;
}
...
}</programlisting>
<para>Inside the loop's body the relative maximum value of two
variables gets computed. This may be needed at several locations and
thus it is convenient to centralize this code into a method:</para>
<programlisting language="none">// cf. <xsl:template match="...">
static int maximum(int a, int b){
if (a < b){
return b;
} else {
return a;
}
}
...
// cf. <xsl:apply-templates select="..."/>
for (int i = 0; i < 10; i++){
max[i] = maximum(a[i], b[i]);
}</programlisting>
<para>So far calling a static method in <link
linkend="gloss_Java"><trademark>Java</trademark></link> may be
compared to a <tag class="starttag">xsl:apply-templates</tag>. There
is however one big difference. In <abbrev
xlink:href="http://www.w3.org/Style/XSL">XSL</abbrev> the
<quote>method</quote> being called may not exist at all. A <tag
class="starttag">xsl:apply-templates</tag> instructs a processor to
format a set of nodes. It does not contain information about any rules
being defined to do this job:</para>
<programlisting language="none"><xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
version="2.0">
<xsl:output method="text"/>
<xsl:template match="/memo">
<xsl:apply-templates <emphasis role="bold">select="content"</emphasis>/>
</xsl:template>
</xsl:stylesheet></programlisting>
<para>Since no suitable template supplying rules for <tag
class="starttag">content</tag> nodes exists a <abbrev
xlink:href="http://www.w3.org/Style/XSL">XSL</abbrev> processor uses a
default formatting rule instead:</para>
<programlisting language="none"><computeroutput>Thanks for your excellent work.Our firewall is definitely
broken! This bug has been reported by the sender.</computeroutput></programlisting>
<para>We observe that the <code>#PCDATA</code> content strings of the
element itself and all (recursive) sub elements get glued together
into one string. In most cases this is definitely not intended.
Omitting a necessary template is usually a programming error. It is
thus good programming practice during style sheet development to
define a special template catching forgotten rules:</para>
<programlisting language="none"><xsl:template match="/memo">
<xsl:apply-templates select="content"/>
</xsl:template>
<xsl:template match="*">
<xsl:message>
<xsl:text>Error: No template defined matching element '</xsl:text>
<xsl:value-of select="name(.)"/>
<xsl:text>'</xsl:text>
</xsl:message>
</xsl:template></programlisting>
<para>The <quote>*</quote> matches any element if there is no <link
xlink:href="http://www.w3.org/TR/xslt20/#conflict">better
matching</link> rule defined. Since we did not supply any template for
<tag class="starttag">content</tag> nodes at all this default template
will match nodes of type <tag class="starttag">content</tag>. The
function <code>name()</code> is predefined in <abbrev
xlink:href="http://www.w3.org/Style/XSL">XSL</abbrev> and returns the
element type name of a node. During the formatting process we will now
see the following warning message:</para>
<programlisting language="none"><computeroutput>Error: No template defined matching element 'content'</computeroutput></programlisting>
<para>We note that for document nodes <tag
class="starttag">xyz</tag><code>foo</code><tag
class="endtag">xyz</tag> containing only <code>#PCDATA</code> a simple
<tag class="emptytag">xsl:apply-templates select="xyz"</tag> is
sufficient: A <abbrev
xlink:href="http://www.w3.org/Style/XSL">XSL</abbrev> processor uses
its default rule and copies the node's content <code>foo</code> to its
output.</para>
<qandaset defaultlabel="qanda" xml:id="example_rdbms_person">
<title>Extending the export to a RDBMS</title>
<qandadiv>
<qandaentry>
<question>
<para>We assume that our RDBMS table <code>Customer</code>
from <xref linkend="programlisting_memo_export_sql"/> shall be
replaced by a table <code>Person</code>. We expect the senders
of memo documents to be employees of a given company.
Conversely the recipients of memos are expected to be
customers. Our <code>Person</code> table shall have a
<quote>tag</quote> like column named <code>type</code> having
exactly two allowed values <code>customer</code> or
<code>employee</code> being controlled by a <code>CHECK</code>
constraint, see <xref linkend="table_person"/>. Create a style
sheet generating the necessary SQL statements from a memo
document instance. Hint: Define two different templates for
<tag class="starttag">from</tag> and <tag
class="starttag">to</tag> nodes.</para>
</question>
<answer>
<para>We define two templates differing only in the static
string value for a person's type. The relevant <abbrev
xlink:href="http://www.w3.org/Style/XSL">XSL</abbrev> portion
reads:<programlisting language="none"><xsl:template match="/memo">
<xsl:apply-templates select="from|to"/>
</xsl:template>
<xsl:template match="from">
<xsl:text>INSERT INTO Person (name, type) VALUES('</xsl:text>
<xsl:value-of select="."/>
<xsl:text>', <emphasis role="bold">'employee'</emphasis>)</xsl:text>
<xsl:value-of select="$newline"/>
</xsl:template>
<xsl:template match="to">
<xsl:text>INSERT INTO Person (name, type) VALUES('</xsl:text>
<xsl:value-of select="."/>
<xsl:text>', <emphasis role="bold">'customer'</emphasis>)</xsl:text>
<xsl:value-of select="$newline"/>
</xsl:template></programlisting></para>
</answer>
</qandaentry>
</qandadiv>
</qandaset>
<table xml:id="table_person">
<title>The Person table</title>
<?dbhtml table-width="30%" ?>
<?dbfo table-width="40%" ?>
<tgroup cols="2">
<colspec colwidth="3*"/>
<colspec colwidth="2*"/>
<thead>
<row>
<entry>name</entry>
<entry>type</entry>
</row>
</thead>
<tbody>
<row>
<entry>Martin Goik</entry>
<entry>employee</entry>
</row>
<row>
<entry>Adam Hacker</entry>
<entry>customer</entry>
</row>
<row>
<entry>Eve intruder</entry>
<entry>customer</entry>
</row>
</tbody>
</tgroup>
</table>
</section>
<section xml:id="xsl_choose">
<title><tag class="starttag">xsl:choose</tag></title>
<para>We already described the <tag class="starttag">xsl:if</tag>
which can be compared to an <code>if(..){...}</code> statement in many
programming languages. The <tag class="starttag">xsl:choose</tag>
element can be compared to multiple <code>else</code> conditions
including an optional final <code>else</code> block being reached if
all boolean tests fail:</para>
<programlisting language="none">if (condition a){
...//block 1
} else if (condition b){
... //block b
} ...
...
else {
... //code being reached whan all conditions evaluate to false
}</programlisting>
<para>We want to generate a list of memo recipient names with roman
type numeration up to 10. Higher numbers shall be displayed in
ordinary decimal notation:</para>
<programlisting language="none"><computeroutput>I:Adam Hacker
II:Eve intruder
III: ...
IV: ...
...</computeroutput></programlisting>
<para>Though <abbrev
xlink:href="http://www.w3.org/Style/XSL">XSL</abbrev> offers <link
xlink:href="http://www.w3.org/TR/xslt20/#convert">a better way</link>
we may generate these number literals by:</para>
<programlisting language="none"><xsl:template match="/memo">
<xsl:apply-templates select="to"/>
</xsl:template>
<xsl:template match="to">
<xsl:choose>
<xsl:when test="1 = position()">I</xsl:when>
<xsl:when test="2 = position()">II</xsl:when>
<xsl:when test="3 = position()">III</xsl:when>
<xsl:when test="4 = position()">IV</xsl:when>
<xsl:when test="5 = position()">V</xsl:when>
<xsl:when test="6 = position()">VI</xsl:when>
<xsl:when test="7 = position()">VII</xsl:when>
<xsl:when test="8 = position()">VIII</xsl:when>
<xsl:when test="9 = position()">IX</xsl:when>
<xsl:when test="10 = position()">X</xsl:when>
<xsl:otherwise>
<xsl:value-of select="position()"/>
</xsl:otherwise>
</xsl:choose>
<xsl:text>:</xsl:text>
<xsl:value-of select="."/>
<xsl:value-of select="$newline"/>
</xsl:template></programlisting>
<para>Note that this conversion is incomplete: If the number in
question is larger than 10 it will be formatted in ordinary decimal
style according to the <tag class="starttag">xsl:otherwise</tag>
clause.</para>
</section>
<section xml:id="section_html_book">
<title>A complete HTML formatting example</title>
<para>We now present a series of exercises showing how to format <tag
class="starttag">book</tag> document instances to XHTML. This is done
in a step by step manner each time showing correspondent code snippets
for our <filename>memo.xsd</filename>.</para>
<section xml:id="section_memo_to_list">
<title>Listing the recipients of a memo</title>
<para>In order to generate a XHTML <link
xlink:href="http://www.w3.org/TR/html401/struct/lists.html#h-10.2">list</link>
of all <tag class="starttag">memo</tag> recipients of a memo we have
to use <tag class="starttag">xsl:output method="xhtml"</tag> and
embed the required HTML tags in our <abbrev
xlink:href="http://www.w3.org/Style/XSL">XSL</abbrev> style
sheet:</para>
<programlisting language="none"><xsl:output method="xhtml" indent="yes"/>
<xsl:template match="/memo">
<html>
<head>
<title>Recipient list</title>
</head>
<body>
<ul>
<xsl:apply-templates select="to"/>
</ul>
</body>
</html>
</xsl:template>
<xsl:template match="to">
<li>
<xsl:value-of select="."/>
</li>
</xsl:template></programlisting>
<para>Processing this style sheet for a <tag
class="starttag">memo</tag> document instance yields:</para>
<programlisting language="none"><?xml version="1.0" encoding="UTF-8"?>
<html>
<head>
<title>Recipient list</title>
</head>
<body>
<ul>
<li>Adam Hacker</li>
<li>Eve intruder</li>
</ul>
</body>
</html></programlisting>
<para>The generated Xhtml code does not contain a reference to a
DTD. We may supply this reference by modifying our <tag
class="emptytag">xsl:output</tag> directive:</para>
<programlisting language="none"><xsl:output method="xhtml" indent="yes"
<emphasis role="bold">doctype-public</emphasis>="-//W3C//DTD XHTML 1.0 Strict//EN"
<emphasis role="bold">doctype-system</emphasis>="http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"/></programlisting>
<para>This adds a corresponding header which allows to validate the
generated HTML:</para>
<programlisting language="none"><!DOCTYPE html
PUBLIC "<emphasis role="bold">-//W3C//DTD XHTML 1.0 Strict//EN</emphasis>"
"<emphasis role="bold">http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd</emphasis>">
<html><head> ...</programlisting>
<para>This may be improved further by instructing the XSL formatter
to use <uri
xlink:href="http://www.w3.org/1999/xhtml">http://www.w3.org/1999/xhtml</uri>
as default namespace:</para>
<programlisting language="none"><?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet <emphasis role="bold">xmlns="http://www.w3.org/1999/xhtml"</emphasis>
xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="2.0">
<xsl:output method="xhtml" indent="yes"
doctype-public="-//W3C//DTD XHTML 1.0 Strict//EN"
doctype-system="http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"/>
<xsl:template match="/">
<html><head> ...
</xsl:template>
...
</xsl:stylesheet></programlisting>
<para>This yields the following output::</para>
<programlisting language="none"><?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE html
PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html <emphasis role="bold">xmlns="http://www.w3.org/1999/xhtml"</emphasis>>
<head> ...
</html></programlisting>
<para>The top level element <tag class="element">html</tag> is now
declared to belong to the namespace
<code>xmlns="http://www.w3.org/1999/xhtml</code>. This will be
inherited by all inner Xhtml elements.</para>
<qandaset defaultlabel="qanda" xml:id="example_xsl_book_1_dtd">
<title>Transforming book instances to Xhtml</title>
<qandadiv>
<qandaentry>
<question>
<para>Create a <abbrev
xlink:href="http://www.w3.org/Style/XSL">XSL</abbrev> style
sheet to transform instances of the first version of <link
endterm="example_bookDtd"
linkend="example_bookDtd">book.xsd</link> (<xref
linkend="example_bookDtd"/>) into <uri
xlink:href="http://www.w3.org/TR/xhtml1/#a_dtd_XHTML-1.0-Strict">Xhtml
1.0 strict</uri>.</para>
<para>You should first construct a Xhtml document
<emphasis>manually</emphasis> before coding the XSL. After
you have a <quote>working</quote> Xhtml example document
create a <abbrev
xlink:href="http://www.w3.org/Style/XSL">XSL</abbrev> style
sheet which transforms arbitrary
<filename>book.xsd</filename> document instances into a
corresponding Xhtml file.</para>
</question>
<answer>
<programlisting language="none"><?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="2.0">
<xsl:output indent="yes" method="xhtml"/>
<xsl:template match="/book">
<html>
<head>
<title><xsl:value-of select="title"/></title>
</head>
<body>
<h1><xsl:value-of select="title"/></h1>
<xsl:apply-templates select="chapter"/>
</body>
</html>
</xsl:template>
<xsl:template match="chapter">
<h2><xsl:value-of select="title"/></h2>
<xsl:apply-templates select="para"/>
</xsl:template>
<xsl:template match="para">
<p><xsl:value-of select="."/></p>
</xsl:template>
</xsl:stylesheet></programlisting>
</answer>
</qandaentry>
</qandadiv>
</qandaset>
</section>
<section xml:id="section_xsl_attribute">
<title><tag class="starttag">xsl:attribute</tag></title>
<para>Sometimes we want to set attribute values in a generated XML
document. For example we might want to set the background color
<quote>red</quote> if a memo has a priority value of <tag
class="attvalue">high</tag>:</para>
<programlisting language="none"><h1 style="background:red">Firewall problems</h1></programlisting>
<para>Regarding our memo example this may be achieved by:</para>
<programlisting language="none"><xsl:template match="/memo">
<html>
...
<body>
<xsl:variable name="<emphasis role="bold">messageColor</emphasis>" <co
xml:id="programlisting_priority_lolor_vardef"/>>
<xsl:choose>
<xsl:when test="@priority = 'low'">green</xsl:when>
<xsl:when test="@priority = 'medium'">yellow</xsl:when>
<xsl:when test="@priority = 'high'">red</xsl:when>
</xsl:choose>
</xsl:variable>
<h1 style="background:{<emphasis role="bold">$messageColor</emphasis>};" <co
xml:id="programlisting_priority_lolor_usevar"/>>
<xsl:value-of select="subject"/>
</h1>
</body>
</html>
</xsl:template></programlisting>
<calloutlist>
<callout arearefs="programlisting_priority_lolor_vardef">
<para>Definition of a color name depending on the attribute <tag
class="attvalue">priority</tag>'s value. The set off possible
attribute values (low,medium,high) is mapped to the color names
(green, yellow,red).</para>
</callout>
<callout arearefs="programlisting_priority_lolor_usevar">
<para>The color variable is used to compose the attribute <tag
class="attribute">style</tag>'s value. The curly
<code>{...}</code> braces are part of the <abbrev
xlink:href="http://www.w3.org/Style/XSL">XSL</abbrev> standard's
syntax. They are required here to instruct the <abbrev
xlink:href="http://www.w3.org/Style/XSL">XSL</abbrev> processor
to substitute the local variable <code>messageColor</code>'s
value instead of simply copying the literal string
<quote><code>$messageColor</code></quote> itself to the output
document e.g. generating <tag class="starttag">h1 style =
"background:$messageColor;"</tag>.</para>
</callout>
</calloutlist>
<para>Instead of constructing an extra variable <abbrev
xlink:href="http://www.w3.org/Style/XSL">XSL</abbrev> offers a
slightly more compact way for the same purpose. The <tag
class="starttag">xsl:attribute</tag> element allows us to define the
name of an attribute to be added together with an attribute value
specification:</para>
<programlisting language="none"><xsl:template match="/memo">
<html>
...
<h1>
<xsl:attribute name="<emphasis role="bold">style</emphasis>">
<xsl:text>background:</xsl:text>
<xsl:choose>
<xsl:when test="@priority = 'low'">green</xsl:when>
<xsl:when test="@priority = 'medium'">yellow</xsl:when>
<xsl:when test="@priority = 'high'">red</xsl:when>
</xsl:choose>
</xsl:attribute>
<xsl:value-of select="subject"/>
</h1>
</body>
</html>
</xsl:template></programlisting>
<qandaset defaultlabel="qanda" xml:id="example_book_toc">
<title>Adding a table of contents (toc)</title>
<qandadiv>
<qandaentry>
<question>
<para>For larger document instances it is convenient to add
a table of contents to the generated Xhtml document. <!-- We
demonstrate the desired result as an <uri
xlink:href="src/viewlet/bookhtmltoc/bookhtmltoc_viewlet_swf.html">animation</uri>.--></para>
<para>For this exercise you need a unique string value for
each <tag class="starttag">chapter</tag> node. If a <tag
class="starttag">chapter</tag>'s <tag
class="attribute">id</tag> attribute had been declared as
<code>#REQUIRED</code> its value would do this job
perfectly. Unfortunately you cannot rely on its existence
since it is declared to be <code>#IMPLIED</code> and may
thus be absent.</para>
<para>XSL offers a standard function for this purpose namely
<link
xlink:href="http://www.w3.org/TR/xslt20/#generate-id">generate-id(...)</link>.
In a nutshell this function takes a XML node as an argument
(or being called without arguments it uses the context node)
and creates a string value being unique with respect to
<emphasis>all</emphasis> other nodes in the document. For a
given node the function may be called repeatedly and is
guaranteed to always return the same value during the
<emphasis>same</emphasis> transformation run. So it suffices
to add something like <tag class="starttag">a
href="#{generate-id(...)}"</tag> or use it in conjunction
with <tag class="starttag">xsl:attribute</tag>.</para>
</question>
<answer>
<para>We use the <code>generate-id()</code> function to
create a unique identity string for each chapter node. Since
we also want to define links to the table of contents we
need another unique string value. It is tempting to simply
use a static value like <quote>__toc__</quote> for this
purpose. However we can not be sure that this value
coincides with one of the <code>generate-id()</code>
function return values.</para>
<para>A cleaner solution uses the <tag
class="starttag">book</tag> node's generated identity string
for this purpose. As stated before this value is
definitively unique:</para>
<programlisting language="none"><xsl:template match="/book">
...
<body>
<h1><xsl:value-of select="title"/></h1>
<h2 id="{generate-id(.)}" <co xml:base=""
xml:id="programlisting_book_toc_def_toc"/>>Table of contents</h2>
<ul>
<xsl:for-each select="chapter">
<li>
<a href="#{generate-id(.)}" <co xml:base=""
xml:id="programlisting_book_toc_ref_chap"/>><xsl:value-of select="title"></xsl:value-of></a>
</li>
</xsl:for-each>
</ul>
<xsl:apply-templates select="chapter"/>
</body>
</html>
</xsl:template>
<xsl:template match="chapter">
<h2 id="{generate-id(.)}" <co xml:base=""
xml:id="programlisting_book_toc_def_chap"/>>
<a href="#{generate-id(/book)}" <co xml:base=""
xml:id="programlisting_book_toc_ref_toc"/>>
<xsl:value-of select="title"/>
</a>
</h2>
<xsl:apply-templates select="para"/>
</xsl:template>
...</programlisting>
<calloutlist>
<callout arearefs="programlisting_book_toc_def_toc">
<para>The current context node is <tag
class="starttag">book</tag>. We use it as argument to
<code>generate-id()</code> to create a unique identity
string.</para>
</callout>
<callout arearefs="programlisting_book_toc_ref_chap">
<para>The <tag class="starttag">xsl:for-each</tag>
iterates over all <tag class="starttag">chapter</tag>
nodes. We reference the corresponding target nodes being
created in <xref
linkend="programlisting_book_toc_def_chap"/>.</para>
</callout>
<callout arearefs="programlisting_book_toc_def_chap">
<para>Each <tag class="starttag">chapter</tag>'s heading
is supplied with a unique identity string being
referenced from <xref
linkend="programlisting_book_toc_ref_chap"/>.</para>
</callout>
<callout arearefs="programlisting_book_toc_ref_toc">
<para>Clicking on a chapter's title shall take us back
to the table of contents (toc). So we create a hypertext
link referencing our toc heading's identity string being
defined in <xref
linkend="programlisting_book_toc_def_toc"/>.</para>
</callout>
</calloutlist>
</answer>
</qandaentry>
</qandadiv>
</qandaset>
</section>
<section xml:id="section_xsl_mixed">
<title>XSL and mixed content</title>
<para>The subsequent example shows an element <tag
class="starttag">content</tag> having a mixed content model possibly
containing <tag class="starttag">url</tag> and <tag
class="starttag">emphasis</tag> child nodes:</para>
<programlisting language="none"><content>The <emphasis
role="bold"><url href="http://w3.org/XML">XML</url></emphasis> language
is <emphasis role="bold"><emphasis>easy</emphasis></emphasis> to learn. However you need
some <emphasis role="bold"><emphasis>time</emphasis></emphasis>.</content></programlisting>
<para>Embedded element nodes have been set to bold style in order to
distinguish them from <code>xs:text</code> nodes. A possible
<acronym>XHtml</acronym> output might look like:</para>
<programlisting language="none"><p>The <emphasis role="bold"><a href="http://w3.org/XML">XML</a>language is<em>easy</em></emphasis> to learn. However you
need some <emphasis role="bold"><em>time</em></emphasis>.</p></programlisting>
<para>We start with a first version of an <abbrev
xlink:href="http://www.w3.org/Style/XSL">XSL</abbrev>
template:</para>
<programlisting language="none"> <xsl:template match="content">
<p>
<xsl:value-of select="."/>
</p>
</xsl:template></programlisting>
<para>As mentioned earlier all <code>#PCDATA</code> text nodes of
the whole subtree are glued together leading to:</para>
<programlisting language="none"><p>The XML language is easy to learn. However you need some time.</p></programlisting>
<para>Our next attempt is to define templates to format the elements
<tag class="starttag">url</tag> and <tag
class="starttag">emphasis</tag>:</para>
<programlisting language="none">...
<xsl:template match="content">
<p>
<xsl:apply-templates select="emphasis|url"/>
</p>
</xsl:template>
<xsl:template match="url">
<a href="{@href}"><xsl:value-of select="."/></a>
</xsl:template>
<xsl:template match="emphasis">
<em><xsl:value-of select="."/></em>
</xsl:template>
...</programlisting>
<para>As expected the sub elements are formatted correctly.
Unfortunately the <code>#PCDATA</code> text nodes between the
element nodes are lost:</para>
<programlisting language="none"><p>
<a href="http://w3.org/XML">XML</a>
<em>easy</em>
<em>time</em>
</p></programlisting>
<para>To correct this transformation script we have to tell the
formatting processor to include bare text nodes into the output. The
<abbrev xlink:href="http://www.w3.org/TR/xpath">XPath</abbrev>
standard defines a function <link
xlink:href="http://www.w3.org/TR/xpath#path-abbrev">text()</link>
for this purpose. It returns the boolean value <code>true</code> for
an argument node of type text:</para>
<programlisting language="none">...
<xsl:template match="content">
<p>
<xsl:apply-templates select="<emphasis role="bold">text()</emphasis>|emphasis|url"/>
</p>
</xsl:template>
...</programlisting>
<para>The yields the desired output. The text node result elements
are shown in bold style</para>
<programlisting language="none"><p><emphasis role="bold">The</emphasis> <a href="http://w3.org/XML">XML</a><emphasis
role="bold"> language is </emphasis><em>easy</em><emphasis
role="bold"> to learn. However
you need some </emphasis><em>time</em><emphasis role="bold">.</emphasis></p></programlisting>
<para>Some remarks:</para>
<orderedlist>
<listitem>
<para>The <abbrev
xlink:href="http://www.w3.org/TR/xpath">XPath</abbrev>
expression <code>select="text()|emphasis|url"</code> corresponds
nicely to the schema's content model definition:</para>
<programlisting language="none"><xs:element name="content">
<xs:complexType <emphasis role="bold">mixed="true"</emphasis>>
<xs:choice minOccurs="0" maxOccurs="unbounded">
<xs:element <emphasis role="bold">ref="emphasis"</emphasis>/>
<xs:element <emphasis role="bold">ref="url"</emphasis>/>
</xs:choice>
...
</xs:complexType>
</xs:element></programlisting>
</listitem>
<listitem>
<para>In most mixed content models <emphasis>all</emphasis> sub
elements of e.g. <tag class="starttag" role="">content</tag>
have to be formatted. During development some of the elements
defined in a schema are likely to be omitted by accidence. For
this reason the <quote>typical</quote> <abbrev
xlink:href="http://www.w3.org/TR/xpath">XPath</abbrev>
expression acting on mixed content models is defined to match
<emphasis>any</emphasis> sub element nodes:</para>
<programlisting language="none">select="text()|<emphasis
role="bold">*</emphasis>"</programlisting>
</listitem>
<listitem>
<para>Regarding <code>select="text()|emphasis|url"</code> we
have defined two templates for element nodes <tag
class="starttag">emphasis</tag> and <tag
class="starttag">url</tag>. What happens to those text nodes
being matched by <code>text()</code>? These are subject to a
default rule: The content of bare text nodes is written to the
output. We may however redefine this default rule by adding a
template:</para>
<programlisting language="none"><xsl:template match="text()">
<emphasis role="bold"><span style="color:red">
<xsl:value-of select="."/>
</span></emphasis>
</xsl:template></programlisting>
<para>This yields:</para>
<programlisting language="none"><p>
<emphasis role="bold"><span style="color:red">The </span></emphasis>
<a href="http://w3.org/XML">XML</a>
<emphasis role="bold"><span style="color:red"> language is </span></emphasis>
<em>easy</em>
<emphasis role="bold"><span style="color:red"> to learn. However you need some </span></emphasis>
<em>time</em>
<emphasis role="bold"><span style="color:red">.</span></emphasis>
</p></programlisting>
<para>In most cases it is not desired to replace all text nodes
throughout the whole document. In the current example we might
only format text nodes being <emphasis>immediate</emphasis>
children of <tag class="starttag">content</tag>. This may be
achieved by restricting the <abbrev
xlink:href="http://www.w3.org/TR/xpath">XPath</abbrev>
expression to <tag class="starttag">xsl:template
match="content/text()"</tag>.</para>
</listitem>
</orderedlist>
</section>
<section xml:id="section_xsl_functionid">
<title>The function <code>id()</code></title>
<para>In <abbrev
xlink:href="http://www.w3.org/Style/XSL">XSL</abbrev> we sometimes
want to lookup nodes by an attribute value of type <link
xlink:href="???">ID</link>. We consider our product catalog from
<xref linkend="sectSchemaProductCatalog"/>. The following <abbrev
xlink:href="http://www.w3.org/Style/XSL">XSL</abbrev> may be used to
create <acronym>XHtml</acronym>l documents from <tag
class="starttag">catalog</tag> instances:</para>
<programlisting language="none" xml:lang=""><xsl:template match="/catalog">
<html>
<head><title>Product catalog</title></head>
<body>
<h1>List of Products</h1>
<xsl:apply-templates select="product"/>
</body>
</html>
</xsl:template>
<xsl:template match="product">
<h2 id="{@id}" <co xml:base=""
xml:id="programlisting_catalog2html_v1_defid"/>><xsl:value-of select="title"/></h2>
<xsl:apply-templates select="para"/>
</xsl:template>
<xsl:template match="para">
<p><xsl:apply-templates select="text()|*" <co
xml:id="programlisting_catalog2html_v1_mixed"/>/></p>
</xsl:template>
<xsl:template match="link">
<a href="#{@ref}" <co xml:id="programlisting_catalog2html_v1_refid"/>><xsl:value-of select="."/></a>
</xsl:template></programlisting>
<calloutlist>
<callout arearefs="programlisting_catalog2html_v1_defid">
<para>The <code>ID</code> attribute <tag
class="starttag">product id="foo"</tag> is unique within the
document instance. We may thus use it as an unique string value
in the generated Xhtml, too.</para>
</callout>
<callout arearefs="programlisting_catalog2html_v1_mixed">
<para>Mixed content consisting of text and <tag
class="starttag">link</tag> nodes.</para>
</callout>
<callout arearefs="programlisting_catalog2html_v1_refid">
<para>We define a file local Xhtml reference to a
product.</para>
</callout>
</calloutlist>
<para>The <tag class="starttag">para</tag> element from the example
document instance containing a <tag class="starttag">link
ref="homeTrainer"</tag> reference will be formatted as:</para>
<programlisting language="none"><p>If you hate rain look <a href="#homeTrainer">here</a>.</p></programlisting>
<para>Now suppose we want to add the product's title <emphasis>Home
trainer</emphasis> here to give the reader an idea about the product
without clicking the hypertext link:</para>
<programlisting language="none"><p>If you hate rain look <a href="#homeTrainer">here</a> <emphasis
role="bold">(Home trainer)</emphasis>.</p></programlisting>
<para>This title text node is part of the <tag
class="starttag">product</tag>node being referenced from the current
<tag class="starttag">para</tag>:</para>
<figure xml:id="linkIdrefProduct">
<title>A graphical representation of our <tag
class="starttag">catalog</tag>.</title>
<mediaobject>
<imageobject>
<imagedata fileref="Ref/Fig/xsl_id.fig"/>
</imageobject>
<caption>
<para>The dashed line shows the <code>IDREF</code> based
reference from the <tag class="starttag">link</tag> to the
<tag class="starttag">product</tag> node.</para>
</caption>
</mediaobject>
</figure>
<para>In <abbrev
xlink:href="http://www.w3.org/Style/XSL">XSL</abbrev> we may follow
<code>ID</code> reference by means of the built in function <link
xlink:href="http://www.w3.org/TR/xpath#function-id">id(...)</link>:</para>
<programlisting language="none"><xsl:template match="link">
<a href="#{@ref}"><xsl:value-of select="."/></a>
<xsl:text> (</xsl:text>
<xsl:value-of select="<emphasis role="bold">id(@ref)</emphasis>/title" <co
xml:id="programlisting_xsl_id_follow"/>/>
<xsl:text>)</xsl:text>
</xsl:template></programlisting>
<para>Evaluating <code>id(@ref)</code> at <xref
linkend="programlisting_xsl_id_follow"/> returns the first <tag
class="starttag">product</tag> <emphasis>node</emphasis>. We simply
take its <tag class="starttag">title</tag> value and embed it into a
pair of braces. This way the desired text portion <emphasis
role="bold">(Home trainer)</emphasis> gets added after the hypertext
link.</para>
<qandaset defaultlabel="qanda" xml:id="example_book_xsl_mixed">
<title>Extending the memo style sheet by mixed content and
itemized lists</title>
<qandadiv>
<qandaentry>
<question>
<para>In <xref linkend="example_book.dtd_v5"/> we
constructed a schema allowing itemized lists and mixed
content for <tag class="starttag">book</tag> instances. This
schema also allowed to define <tag
class="starttag">emphasis</tag>, <tag
class="starttag">table</tag> and <tag
class="starttag">link</tag> elements being part of a mixed
content definition. Extend the current book2html.xsl to
account for these extensions.</para>
<para
xlink:href="http://www.w3.org/TR/xslt20/#element-copy-of">As
we already saw in our memo example itemized lists in Xhtml
are represented by the element <tag
class="starttag">ul</tag> containing <tag
class="starttag">li</tag> elements. Since <tag
class="starttag">p</tag> elements are also allowed to appear
as children our itemized lists can be easily mapped to Xhtml
tags. A<tag class="starttag">link</tag> node may be
transformed into <tag class="starttag">a href="..."</tag>
Xhtml node.</para>
<para>The table model is a simplified version of the Xhtml
table model. Read the <abbrev
xlink:href="http://www.w3.org/Style/XSL">XSL</abbrev>
documentation of the element <tag
class="emptytag">xsl:copy-of</tag> at <link
xlink:href="http://www.w3.org/TR/xslt20/#element-copy-of">copy-of</link>
for processing tables.</para>
</question>
<answer>
<para>The full source code of the solution is available at
<link
xlink:href="Ref/src/Dtd/book/v5/book2html.1.xsl">(Online
HTML version) ... book2html.1.xsl</link>. We discuss some
important aspects. The following table provides mapping
rules from <filename>book.xsd</filename> to Xhtml:</para>
<table xml:id="table_book2xhtml_element_mappings">
<title>Mapping elements from <filename>book.xsd</filename>
to Xhtml</title>
<?dbhtml table-width="50%" ?>
<?dbfo table-width="50%" ?>
<tgroup cols="2">
<colspec colwidth="3*"/>
<colspec colwidth="2*"/>
<thead>
<row>
<entry>book.xsd</entry>
<entry>Xhtml</entry>
</row>
</thead>
<tbody>
<row>
<entry><tag class="starttag">book</tag>/<tag
class="starttag">title</tag></entry>
<entry><tag class="starttag">h1</tag></entry>
</row>
<row>
<entry><tag class="starttag">chapter</tag>/<tag
class="starttag">title</tag></entry>
<entry><tag class="starttag">h2</tag></entry>
</row>
<row>
<entry><tag class="starttag">para</tag> (mixed
content)</entry>
<entry><tag class="starttag">p</tag></entry>
</row>
<row>
<entry><tag class="starttag">link
href="foo"</tag></entry>
<entry><tag class="starttag">a
href="foo"</tag></entry>
</row>
<row>
<entry><tag class="starttag">emphasis</tag></entry>
<entry><tag class="starttag">em</tag></entry>
</row>
<row>
<entry><tag
class="starttag">itemizedlist</tag></entry>
<entry><tag class="starttag">ul</tag></entry>
</row>
<row>
<entry><tag class="starttag">listitem</tag></entry>
<entry><tag class="starttag">li</tag></entry>
</row>
<row>
<entry><tag class="starttag">table</tag>, <tag
class="starttag">caption</tag>,<tag
class="starttag">tr</tag>, <tag
class="starttag">td</tag> along with all
attributes</entry>
<entry>Identity copy</entry>
</row>
</tbody>
</tgroup>
</table>
<para>Since our table model is a subset of the HTML table
model we may simply copy corresponding nodes to the
output:</para>
<programlisting language="none"><xsl:template match="table">
<xsl:copy-of select="."/>
</xsl:template></programlisting>
<para>Next we need rules for itemized lists and paragraphs.
Our model already implements lists in a way that closely
resembles XHTML lists. Since the structure are compatible we
only have to provide a mapping:</para>
<programlisting language="none"><xsl:template match="para">
<p id="{generate-id(.)}"><xsl:apply-templates select="text()|*" /></p>
</xsl:template>
<xsl:template match="itemizedlist">
<ul><xsl:apply-templates select="listitem"/></ul>
</xsl:template>
<xsl:template match="listitem">
<li><xsl:apply-templates select="*"/></li>
</xsl:template></programlisting>
<para>Since <emphasis>all</emphasis> chapters are reachable
via hypertext links from the table of contents we
<emphasis>must</emphasis> supply a unique <code>id</code>
value <xref
linkend="programlisting_book2html_single_chapterid"/> for
<emphasis>all</emphasis> of them. Chapters and paragraphs
may be referenced by <tag class="starttag">link</tag>
elements and thus <emphasis>both</emphasis> need a unique
identity value. For simplicity we create both of them via
<code>generate-id()</code>. In a more sophisticated solution
the strategy would be slightly different:</para>
<itemizedlist>
<listitem>
<para>If a <tag class="starttag">chapter</tag> node does
have an <code>id</code> attribute defined then take its
value.</para>
</listitem>
<listitem>
<para>If a <tag class="starttag">chapter</tag> node does
<emphasis>not</emphasis> have an <code>id</code>
attribute defined then use
<code>generate-id()</code>.</para>
</listitem>
<listitem>
<para><tag class="starttag">para</tag> nodes only get
values in XHTML if they do have an <code>id</code>
attribute defined. This is consistent since these nodes
are never referenced from the table of contents. Thus an
identity is only required if the <tag
class="starttag">para</tag> node is referenced by a <tag
class="starttag">link</tag>. If that is a case the <tag
class="starttag">para</tag> surely does have a defined
identity value.</para>
</listitem>
</itemizedlist>
<para>We also have to provide a hypertext link <xref
linkend="programlisting_book2html_single_toclink"/> to the
table of contents:</para>
<programlisting language="none"><xsl:template match="chapter">
<h2 id="{<emphasis role="bold">generate-id(.)</emphasis>}" <co
xml:base=""
xml:id="programlisting_book2html_single_chapterid"/>>
<a href="#{<emphasis role="bold">generate-id(/book)</emphasis>}" <co
xml:base=""
xml:id="programlisting_book2html_single_toclink"/>><xsl:value-of select="title"/></a>
</h2>
<xsl:apply-templates select="para|itemizedlist|table"/>
</xsl:template></programlisting>
<para>Implementing the <tag class="starttag">link</tag>
element is somewhat more complicated. We cannot use the
<code>@ref</code> attribute values itself as <tag
class="starttag">a href="..."</tag> attribute values since
the target's identity string is generated via
<code>generate-id()</code>. But we may follow the reference
via the <abbrev
xlink:href="http://www.w3.org/TR/xpath">XPath</abbrev> <link
linkend="section_xsl_functionid">id()</link> function and
then use the target's identity value:</para>
<programlisting language="none"><xsl:template match="link">
<a href="#{generate-id(id(@linkend))}">
<xsl:value-of select="."/>
</a>
</xsl:template></programlisting>
<para>The call to <code>id(@linkend)</code> returns either a
<tag class="starttag">chapter</tag> or a <tag
class="starttag">para</tag> node since attributes of type
<code>ID</code> are only defined for these two elements.
Using this node as input to <code>generate-id()</code>
returns the desired identity value to be used in the
generated Xhtml.</para>
</answer>
</qandaentry>
</qandadiv>
</qandaset>
</section>
<section xml:id="xslAxis">
<title>XSL axis definitions</title>
<para>XSL allows us to traverse a document instance's graph in
different directions. We start with a memo document instance:</para>
<programlisting language="none"><memo xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:noNamespaceSchemaLocation="memo.xsd" date="9.9.2099">
<from>Joe</from>
<to>Jack</to>
<to>Eve</to>
<to>Jude</to>
<to>Tolstoi</to>
<subject>Ignore me!</subject>
<content>
<para>Dumb text.</para>
</content>
</memo></programlisting>
<para>This instance defines four nodes of type <tag
class="starttag">to</tag>. For each of these we want to create a
line of text showing also the preceding and the following
recipients:</para>
<programlisting language="none"> <----Jack----> Eve Jude Tolstoi <co
xml:id="programlisting_axis_jack"/>
Jack <----Eve----> Jude Tolstoi <co xml:id="programlisting_axis_eve"/>
Jack Eve <----Jude----> Tolstoi <co xml:id="programlisting_axis_jude"/>
Jack Eve Jude <----Tolstoi----> <co
xml:id="programlisting_axis_tolstoi"/></programlisting>
<calloutlist>
<callout arearefs="programlisting_axis_jack">
<para>Jack has no predecessor and 3 successors</para>
</callout>
<callout arearefs="programlisting_axis_eve">
<para>Eve has 1 predecessor and 2 successors</para>
</callout>
<callout arearefs="programlisting_axis_jude">
<para>Jude has 2 predecessors and 1 successor</para>
</callout>
<callout arearefs="programlisting_axis_tolstoi">
<para><personname>Tolstoi</personname> has 3 predecessors and no
successor</para>
</callout>
</calloutlist>
<para>XSL supports this type of transformation by supplying <acronym
xlink:href="http://www.w3.org/TR/xpath">XPath</acronym> axis
definitions. We consider a memo document with 9 <tag
class="starttag">to</tag> nodes:</para>
<figure xml:id="memo9recipients">
<title>A memo with 9 recipients</title>
<mediaobject>
<imageobject>
<imagedata fileref="Ref/Fig/memofour.fig"/>
</imageobject>
</mediaobject>
</figure>
<para>We marked the 4-th recipient to represent the context node.
All three <tag class="starttag">to</tag> nodes to the
<quote>left</quote> belong to the <emphasis>set</emphasis> of
preceding siblings with respect to the context node. Likewise the 5
neighbours to the right are called following siblings. Returning to
our <quote>four recipient</quote> example we may create the desired
output by:</para>
<programlisting language="none"><xsl:template match="/">
<xsl:apply-templates select="memo/to"/>
</xsl:template>
<xsl:template match="to">
<xsl:for-each select="preceding-sibling::to" <co
xml:id="programlisting_memo_four_xsl_preceding"/>>
<xsl:value-of select="."/>
<xsl:text> </xsl:text>
</xsl:for-each>
<xsl:text> &lt;----</xsl:text>
<xsl:value-of select="."/> <co
xml:id="programlisting_memo_four_xsl_context"/>
<xsl:text>----&gt; </xsl:text>
<xsl:for-each select="following-sibling::to"> <co
xml:id="programlisting_memo_four_xsl_following"/>
<xsl:value-of select="."/>
<xsl:text> </xsl:text>
</xsl:for-each>
<xsl:value-of select="$newline"/>
</xsl:template></programlisting>
<calloutlist>
<callout arearefs="programlisting_memo_four_xsl_preceding">
<para>Iterate on the set of recipients <quote>left</quote> of
the context node.</para>
</callout>
<callout arearefs="programlisting_memo_four_xsl_context">
<para>Taking the context node's value embedded in <code><----
... ----></code>.</para>
</callout>
<callout arearefs="programlisting_memo_four_xsl_following">
<para>Iterate on the set of recipients <quote>right</quote> of
the context node.</para>
</callout>
</calloutlist>
<para>More formally the set of preceding siblings is defined to be
the set of all nodes having the same parent as the context node and
appearing <quote>before</quote> the context node. The notion
<quote>before</quote> is meant in the sense of a <link
xlink:href="http://en.wikipedia.org/wiki/Depth-first_search">depth-first</link>
traversal of the document tree. <abbrev
xlink:href="http://www.w3.org/TR/xpath">XPath</abbrev> provides
different axis definitions, see <uri
xlink:href="http://www.w3.org/TR/xpath#axes">http://www.w3.org/TR/xpath#axes</uri>
for details. We provide an illustration here:</para>
<figure xml:id="disjointAxeSets">
<title>Disjoint <acronym
xlink:href="http://www.w3.org/TR/xpath">XPath</acronym> axis
definitions.</title>
<mediaobject>
<imageobject>
<imagedata fileref="Ref/Fig/preceding.fig"/>
</imageobject>
<caption>
<para>The sets defined by ancestor, descendant, following,
preceding and self are disjoint. Their union forms the set of
all document nodes.</para>
</caption>
</mediaobject>
</figure>
<para>Some remarks:<itemizedlist>
<listitem>
<para>If the context node is already the topmost node i.e. the
root node then the sets defined by <code>ancestor</code> and
<code>parent</code> are empty.</para>
</listitem>
<listitem>
<para>The <code>parent</code> set <emphasis>always</emphasis>
contains zero or one node.</para>
</listitem>
</itemizedlist></para>
</section>
<section xml:id="xslChunking">
<title>Splitting documents into chunks</title>
<para>Sometimes we want to generate multiple output documents from a
single XML source. It may for example be a bad idea to transform a
book of 200 printed pages into a <emphasis>single</emphasis> online
HTML page. Instead we may split each chapter into a separate HTML
file and create navigation links between them.</para>
<para>We consider a memo document instance. We want to generate one
text file for each memo recipient containing just the recipient's
name using the <abbrev
xlink:href="http://www.w3.org/Style/XSL">XSL</abbrev> element <link
xlink:href="http://www.w3.org/TR/xslt20/#element-result-document"><xsl:result-document></link>:</para>
<programlisting language="none"><xsl:template match="/memo">
<xsl:apply-templates select="to"/>
</xsl:template>
<xsl:template match="to">
<emphasis role="bold"><xsl:result-document</emphasis>
<co xml:id="programlisting_xsl_result_document_main"/>
<emphasis role="bold">href="file_{position()}.txt"</emphasis>
<co xml:id="programlisting_xsl_result_document_href"/>
<emphasis role="bold">method="text"</emphasis>
<co xml:id="programlisting_xsl_result_document_method"/>>
<xsl:value-of select="."/> <co
xml:id="programlisting_xsl_result_document_content"/>
<emphasis role="bold"></xsl:result-document></emphasis>
</xsl:template></programlisting>
<calloutlist>
<callout arearefs="programlisting_xsl_result_document_main">
<para>The output from all generating <abbrev
xlink:href="http://www.w3.org/Style/XSL">XSL</abbrev> directives
will be redirected from standard output to another output
channel.</para>
</callout>
<callout arearefs="programlisting_xsl_result_document_href">
<para>The output will be written to a file named
<filename>file_i.txt</filename> with decimal number
<code>i</code> ranging from value 1 up to the number of
recipients.</para>
</callout>
<callout arearefs="programlisting_xsl_result_document_method">
<para>The <code>method</code> attribute possibly overrides a
value being given in the <tag class="starttag">xsl:output</tag>
element. We may also redefine <link
xlink:href="http://www.w3.org/TR/xslt20/#element-result-document">other
attributes</link> from <tag class="starttag">xsl:output</tag>
like <code>doctype-{public.system}</code> and the generated
file's <code>encoding</code>.</para>
</callout>
<callout arearefs="programlisting_xsl_result_document_content">
<para>All output being generated in this region gets redirected
to the channel specified in <xref
linkend="programlisting_xsl_result_document_href"/>.</para>
</callout>
</calloutlist>
<qandaset defaultlabel="qanda" xml:id="example_book_chunk">
<title>Splitting book into chapter files</title>
<qandadiv>
<qandaentry>
<question>
<para>Extend your solution of <xref
linkend="example_book_xsl_mixed"/> by writing each <tag
class="starttag">chapter</tag>'s content into a separate
Xhtml file. In addition create a file
<filename>index.html</filename> which contains references to
the corresponding <tag class="starttag">chapter</tag>
documents. Thus for a document instance with two chapters
the overall navigation structure is illustrated by <xref
linkend="figure_book_navigation"/>.</para>
<para>Implementing the <tag class="starttag">link</tag> tag
may cause a problem: An internal link may reference a <tag
class="starttag">para</tag>. You need to identify the <tag
class="starttag">chapter</tag> node embedding this para.
This may be done by using a suitable <abbrev
xlink:href="http://www.w3.org/TR/xpath">XPath</abbrev> axis
direction.</para>
</question>
<answer>
<para>The full source code of the solution is available at
<link
xlink:href="Ref/src/Dtd/book/v5/book2chunks.1.xsl">(Online
HTML version) ... book2chunks.1.xsl</link>. First we
generate the table of contents file
<filename>index.html</filename>:</para>
<programlisting language="none"><xsl:template match="/">
<xsl:result-document href="index.html">
<xsl:apply-templates select="book"/>
</xsl:result-document>
<xsl:for-each select="book/chapter">
<xsl:result-document href="{generate-id(.)}.html">
<xsl:apply-templates select="."/>
</xsl:result-document>
</xsl:for-each>
</xsl:template>
<xsl:template match="book">
<html>
<head><title><xsl:value-of select="title"/></title></head>
<body>
<h1><xsl:value-of select="title"/></h1>
<h2>Table of contents</h2>
<ul>
<xsl:for-each select="<emphasis role="bold">chapter</emphasis>">
<li><a href="{<emphasis role="bold">generate-id(.)</emphasis>}.html"><xsl:value-of select="title"/></a></li>
</xsl:for-each>
</ul>
</body>
</html>
</xsl:template></programlisting>
<para>The <tag class="starttag">link ref="..."</tag> may
reference a <tag class="starttag">chapter</tag> or a <tag
class="starttag">para</tag>. So we may need to <quote>step
up</quote> from a paragraph to the corresponding chapter
node:</para>
<programlisting language="none"><xsl:template match="link">
<xsl:variable name="reftargetNode" select="id(@linkend)"/>
<xsl:variable name="reftargetParentChapter"
select="$reftargetNode/ancestor-or-self::chapter"/>
<a href="{generate-id($reftargetParentChapter)}.html#{
generate-id($reftargetNode)}">
<xsl:value-of select="."/>
</a>
</xsl:template></programlisting>
<para>This is consistent since <emphasis>all</emphasis> <tag
class="starttag">p</tag> nodes in the generated Xhtml
receive a unique <code>id</code> value regardless whether
the originating <tag class="starttag">para</tag> node does
have one.</para>
</answer>
</qandaentry>
</qandadiv>
</qandaset>
<figure xml:id="figure_book_navigation">
<title>A <tag class="starttag">book</tag> document with two
chapters</title>
<mediaobject>
<imageobject>
<imagedata fileref="Ref/Fig/booknavigate.fig"/>
</imageobject>
</mediaobject>
</figure>
</section>
</section>
</section>
</chapter>
<chapter xml:id="xmlApis">
<title><abbrev xlink:href="http://en.wikipedia.org/wiki/Api">API</abbrev>s
for XML document processing</title>
<section xml:id="sax">
<title>The Simple API for XML</title>
<section xml:id="saxPrinciple">
<title>The principle of a <acronym
xlink:href="http://www.saxproject.org">SAX</acronym>
application</title>
<para>We are already familiar with transformations of XML document
instances to other formats. Sometimes the capabilities being offered
by a given transformation approach do not suffice for a given problem.
Obviously a general purpose programming language like <link
linkend="gloss_Java"><trademark>Java</trademark></link> offers
superior means to perform advanced manipulations of XML document
trees.</para>
<para>Before diving into technical details we present an example
exceeding the limits of our present transformation capabilities. We
want to format an XML catalog document with article descriptions to
HTML. The price information however shall resides in a XML document
external database namely a RDBMS:</para>
<figure xml:id="saxRdbmsAccessPrinciple">
<title>Generating HTML from a XML document and an RDBMS.</title>
<mediaobject>
<imageobject>
<imagedata fileref="Ref/Fig/saxxmlrdbms.fig" scale="65"/>
</imageobject>
</mediaobject>
</figure>
<para>Our catalog might look like:</para>
<figure xml:id="simpleCatalog">
<title>A <link linkend="gloss_XML"><abbrev>XML</abbrev></link> based
catalog.</title>
<programlisting language="none"><catalog>
<item orderNo="<emphasis role="bold">3218</emphasis>">Swinging headset</item>
<item orderNo="<emphasis role="bold">9921</emphasis>">200W Stereo Amplifier</item>
</catalog></programlisting>
</figure>
<para>The RDBMS may hold some relation with a field
<code>orderNo</code> as primary key and a corresponding attribute like
<code>price</code>. In a real world application <code>orderNo</code>
should probably be an integer typed <code>IDENTITY</code>
attribute.</para>
<figure xml:id="saxRdbmsSchema">
<title>A Relation containing price information.</title>
<programlisting language="none">CREATE TABLE Product (
orderNo CHAR(10) PRIMARY KEY
,price Money
)
INSERT INTO Product VALUES('<emphasis role="bold">3218</emphasis>', 42.57)
INSERT INTO Product VALUES('<emphasis role="bold">9921</emphasis>', 121.50)</programlisting>
<caption>
<para>Prices are depending on article numbers.</para>
</caption>
</figure>
<para>The intended HTML output with order numbers being highlighted
looks like:</para>
<figure xml:id="saxPriceOut">
<title>HTML generated output.</title>
<programlisting language="none"><!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<html>
<head><title>Available products</title></head>
<body>
<table border="1">
<tbody>
<tr>
<th><emphasis role="bold">Order number</emphasis></th>
<th>Price</th>
<th>Product</th>
</tr>
<tr>
<td><emphasis role="bold">3218</emphasis></td>
<td>42,57</td>
<td>Swinging headset</td>
</tr>
<tr>
<td><emphasis role="bold">9921</emphasis></td>
<td>121,50</td>
<td>200W Stereo Amplifier</td>
</tr>
</tbody>
</table>
</body>
</html></programlisting>
<caption>
<para>This result HTML document contains content both from our XML
document an from the database table <code>Product</code>.</para>
</caption>
</figure>
<para>The intended transformation is beyond the XSLT standard's
processing capabilities: XSLT does not enable us to RDBMS content.
However some XSLT processors provide extensions for this task.</para>
<para>It is tempting to write a <link
linkend="gloss_Java"><trademark>Java</trademark></link> application
which might use e.g. <trademark
xlink:href="http://electronics.zibb.com/trademark/jdbc/29545026">JDBC</trademark>
for database access. But how do we actually read and parse a XML file?
Sticking to the <link
linkend="gloss_Java"><trademark>Java</trademark></link> standard we
might use a <link
xlink:href="http://java.sun.com/j2se/1.5.0/docs/api/java/io/FileInputStream.html">FileInputStream</link>
instance to read from <code>catalog.xml</code> and write a XML parser
by ourself. Fortunately <orgname>SUN</orgname>'s <trademark
xlink:href="http://www.oracle.com/technetwork/java/javase">JDK</trademark>
already includes an API denoted <acronym
xlink:href="http://www.saxproject.org">SAX</acronym>, the
<emphasis>S</emphasis>imple <emphasis>A</emphasis>pi for
<emphasis>X</emphasis>ml. The<productname
xlink:href="http://www.oracle.com/technetwork/java/javase/jdk-7-readme-429198.html">JDK</productname>
also includes a corresponding parser implementation. In addition there
are third party <acronym
xlink:href="http://www.saxproject.org">SAX</acronym> parser
implementations available like <productname
xlink:href="http://xerces.apache.org">Xerces</productname> from the
<orgname xlink:href="http://www.apache.org">Apache
Foundation</orgname>.</para>
<para>The <acronym
xlink:href="http://www.saxproject.org">SAX</acronym> API is event
based and will be illustrated by the relationship between customers
and a software vendor company:</para>
<mediaobject>
<imageobject>
<imagedata fileref="Ref/Fig/updateinfo.fig"/>
</imageobject>
</mediaobject>
<para>After purchasing software customers are asked to register their
software. This way the vendor receives the customer's address. Each
time a new release is being completed all registered customers will
receive a notification typically including a <quote>special
offer</quote> to upgrade their software. From an abstract point of
view the following two actions take place:</para>
<variablelist>
<varlistentry>
<term>Registration</term>
<listitem>
<para>The customer registers itself at the company's site
indicating it's interest in updated versions.</para>
</listitem>
</varlistentry>
<varlistentry>
<term>Notification</term>
<listitem>
<para>Upon completion of each new software release (considered
to be an <emphasis>event</emphasis>) a message is sent to all
registered customers.</para>
</listitem>
</varlistentry>
</variablelist>
<para>The same principle applies to GUI applications in software
development. A key press <emphasis>event</emphasis> for example will
be forwarded by an application's <emphasis>event handler</emphasis> to
a callback function (sometimes called a <emphasis>handler</emphasis>
method) being implemented by an application developer. The <acronym
xlink:href="http://www.saxproject.org">SAX</acronym> API works the
same way: A parser reads a XML document generating events which
<emphasis>may</emphasis> be handled by an application. During document
parsing the XML tree structure gets <quote>flattened</quote> to a
sequence of events:</para>
<figure xml:id="saxFlattenEvent">
<title>Parsing a XML document creates a corresponding sequence of
events.</title>
<mediaobject>
<imageobject>
<imagedata fileref="Ref/Fig/saxmodel.pdf"/>
</imageobject>
</mediaobject>
</figure>
<para>An application may register components to the parser:</para>
<figure xml:id="figureSax">
<title><acronym xlink:href="http://www.saxproject.org">SAX</acronym>
Principle</title>
<mediaobject>
<imageobject>
<imagedata fileref="Ref/Fig/saxapparch.pdf"/>
</imageobject>
<caption>
<para>A <acronym
xlink:href="http://www.saxproject.org">SAX</acronym> application
consists of a <acronym
xlink:href="http://www.saxproject.org">SAX</acronym> parser and
an implementation of event handlers being specific to the
application. The application is developed by implementing the
two handlers.</para>
</caption>
</mediaobject>
</figure>
<para>An Error Handler is required since the XML stream may contain
errors. In order to implement a <acronym
xlink:href="http://www.saxproject.org">SAX</acronym> application we
have to:</para>
<orderedlist>
<listitem>
<para>Instantiate required objects:</para>
<itemizedlist>
<listitem>
<para>Parser</para>
</listitem>
<listitem>
<para>Event Handler</para>
</listitem>
<listitem>
<para>Error Handler</para>
</listitem>
</itemizedlist>
</listitem>
<listitem>
<para>Register handler instances</para>
<itemizedlist>
<listitem>
<para>register Event Handler to Parser</para>
</listitem>
<listitem>
<para>register Error Handler to Parser</para>
</listitem>
</itemizedlist>
</listitem>
<listitem>
<para>Start the parsing process by calling the parser's
appropriate method.</para>
</listitem>
</orderedlist>
</section>
<section xml:id="saxIntroExample">
<title>First steps</title>
<para>Our first <acronym
xlink:href="http://www.saxproject.org">SAX</acronym> toy application
<classname>sax.stat.v1.ElementCount</classname> shall simply count the
number of elements it finds in an arbitrary XML document. In addition
the <acronym xlink:href="http://www.saxproject.org">SAX</acronym>
events shall be written to standard output generating output sketched
in <xref linkend="saxFlattenEvent"/>. The application's central
implementation reads:</para>
<figure xml:id="saxElementCount">
<title>Counting XML elements.</title>
<programlisting language="none">package sax.stat.v1;
...
public class ElementCount {
public void parse(final String uri) {
try {
final SAXParserFactory saxPf = SAXParserFactory.newInstance();
final SAXParser saxParser = saxPf.newSAXParser();
saxParser.parse(uri, eventHandler);
} catch (ParserConfigurationException e){
e.printStackTrace(System.err);
} catch (org.xml.sax.SAXException e) {
e.printStackTrace(System.err);
} catch (IOException e){
e.printStackTrace(System.err);
}
}
public int getElementCount() {
return eventHandler.getElementCount();
}
private final MyEventHandler eventHandler = new MyEventHandler();
}</programlisting>
<caption>
<para>This application works for arbitrary well-formed XML
documents.</para>
</caption>
</figure>
<para>We now explain this application in detail. The first part deals
with the instantiation of a parser:</para>
<programlisting language="none">try {
final SAXParserFactory saxPf = <emphasis role="bold">SAXParserFactory</emphasis>.newInstance();
final SAXParser saxParser = saxPf.newSAXParser();
saxParser.parse(uri, eventHandler);
} catch (ParserConfigurationException e){
e.printStackTrace(System.err);
} ...</programlisting>
<para>In order to keep an application independent from a specific
parser implementation the <acronym
xlink:href="http://www.saxproject.org">SAX</acronym> uses the so
called <link
xlink:href="http://www.dofactory.com/Patterns/PatternAbstract.aspx">Abstract
Factory Pattern</link> instead of simply calling a constructor from a
vendor specific parser class.</para>
<para>In order to be useful the parser has to be instructed to do
something meaningful when a XML document gets parsed. For this purpose
our application supplies an event handler instance:</para>
<programlisting language="none">public void parse(final String uri) {
try {
final SAXParserFactory saxPf = SAXParserFactory.newInstance();
final SAXParser saxParser = saxPf.newSAXParser();
saxParser.parse(uri, <emphasis role="bold">eventHandler</emphasis>);
} catch (org.xml.sax.SAXException e) {
...
private final MyEventHandler <emphasis role="bold">eventHandler = new MyEventHandler()</emphasis>;
}</programlisting>
<para>What does the event handler actually do? It offers methods to
the parser being callable during the parsing process:</para>
<programlisting language="none">package sax.stat.v1;
...
public class MyEventHandler extends <classname>org.xml.sax.helpers.DefaultHandler</classname> {
public void <emphasis role="bold"><emphasis role="bold">startDocument()</emphasis></emphasis><co
xml:id="programlisting_eventhandler_startDocument"/> {
System.out.println("Opening Document");
}
public void <emphasis role="bold">endDocument()</emphasis><co
xml:id="programlisting_eventhandler_endDocument"/> {
System.out.println("Closing Document");
}
public void <emphasis role="bold">startElement(String namespaceUri, String localName, String rawName,
Attributes attrs)</emphasis> <co
xml:id="programlisting_eventhandler_startElement"/>{
System.out.println("Opening \"" + rawName + "\"");
elementCount++;
}
public void <emphasis role="bold">endElement(String namespaceUri, String localName,
String rawName)</emphasis><co
xml:id="programlisting_eventhandler_endElement"/>{
System.out.println("Closing \"" + rawName + "\"");
}
public void <emphasis role="bold">characters(char[] ch, int start, int length)</emphasis><co
xml:id="programlisting_eventhandler_characters"/>{
System.out.println("Content \"" + new String(ch, start, length) + '"');
}
public int getElementCount() <co
xml:id="programlisting_eventhandler_getElementCount"/>{
return elementCount;
}
private int elementCount = 0;
}</programlisting>
<calloutlist>
<callout arearefs="programlisting_eventhandler_startDocument">
<para>This method gets called exactly once namely when opening the
XML document as a whole.</para>
</callout>
<callout arearefs="programlisting_eventhandler_endDocument">
<para>After successfully parsing the whole document instance this
method will finally be called.</para>
</callout>
<callout arearefs="programlisting_eventhandler_startElement">
<para>This method gets called each time a new element is parsed.
In the given catalog.xml example it will be called three times:
First when the <tag class="starttag">catalog</tag> appears and
then two times upon each <item ... >. The supplied
parameters depend whether or not name space processing is
enabled.</para>
</callout>
<callout arearefs="programlisting_eventhandler_endElement">
<para>Called each time an element like <tag class="starttag">item
...</tag> gets closed by its counterpart <tag
class="endtag">item</tag>.</para>
</callout>
<callout arearefs="programlisting_eventhandler_characters">
<para>This method is responsible for the treatment of textual
content i.e. handling <code>#PCDATA</code> element content. We
will explain its uncommon signature a little bit later.</para>
</callout>
<callout arearefs="programlisting_eventhandler_getElementCount">
<para><function>getElementCount()</function> is a getter method to
read only access the private field <varname>elementCount</varname>
which gets incremented in <coref
linkend="programlisting_eventhandler_startElement"/> each time an
XML element opens.</para>
</callout>
</calloutlist>
<para>The call <code>saxParser.parse(uri, eventHandler)</code>
actually initiates the parsing process and tells the parser to:</para>
<itemizedlist>
<listitem>
<para>Open the XML document being referenced by the URI
argument.</para>
</listitem>
<listitem>
<para>Forward XML events to the event handler instance supplied by
the second argument.</para>
</listitem>
</itemizedlist>
<para>A driver class containing a <code>main(...)</code> method may
start the whole process and print out the desired number of elements
upon completion of a parsing run:</para>
<programlisting language="none">package sax.stat.v1;
public class ElementCountDriver {
public static void main(String argv[]) {
ElementCount xmlStats = new ElementCount();
xmlStats.parse("<emphasis role="bold">Input/Sax/catalog.xml</emphasis>");
System.out.println("Document contains " + xmlStats.<emphasis role="bold">getElementCount()</emphasis> + " elements");
}
}</programlisting>
<para>Processing the catalog example instance yields:</para>
<programlisting language="none">Opening Document
<emphasis role="bold">Opening "catalog"</emphasis> <co
xml:id="programlisting_catalog_output"/>
Content "
"
<emphasis role="bold">Opening "item"</emphasis> <co
xml:id="programlisting_catalog_item1"/>
Content "Swinging headset"
Closing "item"
Content "
"
<emphasis role="bold">Opening "item"</emphasis> <co
xml:id="programlisting_catalog_item2"/>
Content "200W Stereo Amplifier"
Closing "item"
Content "
"
Closing "catalog"
Closing Document
<emphasis role="bold">Document contains 3 elements</emphasis> <co
xml:id="programlisting_catalog_elementcount"/></programlisting>
<calloutlist>
<callout arearefs="programlisting_catalog_output">
<para>Start parsing element <tag
class="starttag">catalog</tag>.</para>
</callout>
<callout arch="" arearefs="programlisting_catalog_item1">
<para>Start parsing element <tag class="starttag">item
orderNo="3218"</tag>Swinging headset<tag class="endtag"
role="">item</tag>.</para>
</callout>
<callout arch="" arearefs="programlisting_catalog_item2">
<para>Start parsing element <tag class="starttag">item
orderNo="9921"</tag>200W Stereo Amplifier<tag class="endtag"
role="">item</tag>.</para>
</callout>
<callout arearefs="programlisting_catalog_elementcount">
<para>After the parsing process has completed the application
outputs the number of elements being counted so far.</para>
</callout>
</calloutlist>
<para>The output contains some lines of <quote>empty</quote> content.
This content is due to whitespace being located between elements. For
example a newline appears between the the <tag
class="starttag">catalog</tag> and the first <tag
class="starttag">item</tag> element. The parser encapsulates this
whitespace in a call to the <link
xlink:href="http://java.sun.com/j2se/1.5.0/docs/api/org/xml/sax/ContentHandler.html#characters(char[],%20int,%20int)">characters</link>
method. In an application this call will typically be ignored. XML
document instances in a professional context will typically not
contain any newline characters at all. Instead the whole document is
represented as a single line. This inhibits human readability which is
not required if the processing applications work well. In this case
empty content as above will not appear.</para>
<para>The <code>characters(char[] ch, int start, int length)</code>
method's signature looks somewhat strange regarding <link
linkend="gloss_Java"><trademark>Java</trademark></link> conventions.
One might expect <code>characters(String s)</code>. But this way the
<acronym xlink:href="http://www.saxproject.org">SAX</acronym> API
allows efficient parser implementations: A parser may initially
allocate a reasonable large <code>char</code> array of say 128 bytes
sufficient to hold 64 (<link
xlink:href="http://unicode.org">Unicode</link>) characters. If this
buffer gets exhausted the parser might allocate a second buffer of
double size thus implementing an <quote>amortized doubling</quote>
algorithm:</para>
<mediaobject>
<imageobject>
<imagedata fileref="Ref/Fig/saxcharacter.pdf"/>
</imageobject>
</mediaobject>
<para>In this example the first element content fits in the first
buffer. The second content <code>200W Stereo Amplifier</code> and the
third content <code>Earphone</code> both fit in the second buffer.
Subsequent content may require further buffer allocations. Such a
strategy minimizes the number of time consuming <code>new </code>
<link
xlink:href="http://java.sun.com/j2se/1.5.0/docs/api/java/lang/String.html">String</link>
<code>(...)</code> constructor calls being necessary for the more
convenient API variant <code>characters(String s)</code>.</para>
</section>
<section xml:id="saxRegistry">
<title>Event- and error handler registration</title>
<para>Our first <acronym
xlink:href="http://www.saxproject.org">SAX</acronym> application
suffers from the following deficiencies:</para>
<itemizedlist>
<listitem>
<para>The error handling is very sparse. It completely relies on
exceptions being thrown by classes like <link
xlink:href="http://java.sun.com/j2se/1.5.0/docs/api/org/xml/sax/SAXException.html">SAXException</link>
which frequently do not supply meaningful error
information.</para>
</listitem>
<listitem>
<para>The application is not aware of namespaces. Thus reading
e.g. <abbrev xlink:href="http://www.w3.org/Style/XSL">XSL</abbrev>
document instances will not allow to distinguish between elements
from different namespaces like HTML.</para>
</listitem>
<listitem>
<para>The parser will not validate a document instance against a
schema being present.</para>
</listitem>
</itemizedlist>
<para>We now incrementally add these features to the <acronym
xlink:href="http://www.saxproject.org">SAX</acronym> parsing process.
<acronym xlink:href="http://www.saxproject.org">SAX</acronym> offers
an interface <link
xlink:href="http://java.sun.com/j2se/1.5.0/docs/api/org/xml/sax/XMLReader.html">XmlReader</link>
to conveniently <emphasis>register</emphasis> event- and error handler
instances independently instead of passing both interfaces as a single
argument to the <link
xlink:href="http://java.sun.com/j2se/1.5.0/docs/api/javax/xml/parsers/SAXParser.html#parse(java.lang.String,%20org.xml.sax.helpers.DefaultHandler)">parse</link>
method. We first code an error handler class by implementing the
interface <classname>org.xml.sax.ErrorHandler</classname> being part
of the <acronym xlink:href="http://www.saxproject.org">SAX</acronym>
API:</para>
<programlisting language="none">package sax.stat.v2;
...
public class MyErrorHandler implements ErrorHandler {
<emphasis role="bold">public void warning(SAXParseException e)</emphasis> {
System.err.println("[Warning]" + getLocationString(e));
}
<emphasis role="bold">public void error(SAXParseException e)</emphasis> {
System.err.println("[Error]" + getLocationString(e));
}
<emphasis role="bold">public void fatalError(SAXParseException e)</emphasis> throws SAXException{
System.err.println("[Fatal Error]" + getLocationString(e));
}
private String getLocationString(SAXParseException e) {
return " line " + e.getLineNumber() +
", column " + e.getColumnNumber()+ ":" + e.getMessage();
}
}</programlisting>
<para>These three methods represent the
<classname>org.xml.sax.ErrorHandler</classname> interface. The method
<function>getLocationString</function> is used to supply precise
parsing error locations by means of line- and column numbers within a
document instance. If errors or warnings are encountered the parser
will call one of the appropriate public methods:</para>
<figure xml:id="saxMissItem">
<title>A non well formed document.</title>
<programlisting language="none"><?xml version="1.0" encoding="UTF-8"?>
<catalog>
<item orderNo="3218">Swinging headset</item>
<item orderNo="9921">200W Stereo Amplifier
</catalog></programlisting>
<caption>
<para>This document is not well formed since due to a missing a
closing <tag class="endtag">item</tag> tag is missing.</para>
</caption>
</figure>
<para>Our error handler method gets called yielding an informative
message:</para>
<programlisting language="none">[Fatal Error] line 5, column -1:Expected "</item>" to terminate
element starting on line 4.</programlisting>
<para>This error output is achieved by
<emphasis>registering</emphasis> an instance of
<classname>sax.stat.v2.MyErrorHandler</classname> to the parser prior
to starting the parsing process. In the following code snippet we also
register a content handler instance to the parser and thus separate
the parser's configuration from its invocation:</para>
<programlisting language="none">package sax.stat.v2;
...
public class ElementCount {
public ElementCount()
throws SAXException, ParserConfigurationException{
final SAXParserFactory saxPf = SAXParserFactory.newInstance();
final SAXParser saxParser = saxPf.newSAXParser();
xmlReader = saxParser.getXMLReader();
xmlReader.setContentHandler(eventHandler); <co
xml:id="programlisting_assemble_parser_setcontenthandler"/>
xmlReader.setErrorHandler(errorHandler); <co
xml:id="programlisting_assemble_parser_seterrorhandler"/>
}
public void parse(final String uri)
throws IOException, SAXException{
xmlReader.parse(uri); <co
xml:id="programlisting_assemble_parser_invokeparse"/>
}
public int getElementCount() {
return eventHandler.getElementCount(); <co
xml:id="programlisting_assemble_parser_getelementcount"/>
}
private final XMLReader xmlReader;
private final MyEventHandler eventHandler = new MyEventHandler(); <co
xml:id="programlisting_assemble_parser_createeventhandler"/>
private final MyErrorHandler errorHandler = new MyErrorHandler(); <co
xml:id="programlisting_assemble_parser_createerrorhandler"/>
}</programlisting>
<calloutlist>
<callout arearefs="programlisting_assemble_parser_setcontenthandler programlisting_assemble_parser_seterrorhandler">
<para>Referring to <xref linkend="figureSax" os=""/> these two
calls attach the event- and error handler objects to the parser
thus implementing the two arrows from the parser to the
application's implementation.</para>
</callout>
<callout arearefs="programlisting_assemble_parser_invokeparse">
<para>The parser is invoked. Note that in this example we only
pass a document's URI but no reference to a handler object.</para>
</callout>
<callout arearefs="programlisting_assemble_parser_getelementcount">
<para>The method <function>getElementCount()</function> is needed
to allow a calling object to access the private
<varname>eventHandler</varname> object's
<function>getElementCount()</function> method.</para>
</callout>
<callout arearefs="programlisting_assemble_parser_createeventhandler programlisting_assemble_parser_createerrorhandler">
<para>An event handling and an error handling object are created
to handle events during the parsing process.</para>
</callout>
</calloutlist>
<para>The careful reader might notice a subtle difference between the
content- and the error handler implementation: The class
<classname>sax.stat.v2.MyErrorHandler</classname> implements the
interface <classname>org.xml.sax.ErrorHandler</classname>. But
<classname>sax.stat.v2.MyEventHandler</classname> is derived from
<classname>org.xml.sax.helpers.DefaultHandler</classname> which itself
implements the <classname>org.xml.sax.ContentHandler</classname>
interface. Actually one might as well start from the latter interface
requiring to implement all of it's 11 methods. In most circumstances
this only complicates the application's code since it is unnecessary
to react to events belonging for example to processing instructions.
For this reason it is good coding practice to use the empty default
implementations in
<classname>org.xml.sax.helpers.DefaultHandler</classname> and to
redefine only those methods corresponding to events actually being
handled by the application in question.</para>
<qandaset defaultlabel="qanda" xml:id="sda1SaxReadAttributes">
<title>SAX and attribute values</title>
<qandadiv>
<qandaentry>
<question>
<label>Reading an element's set of attributes.</label>
<para>The example document instance does include <tag
class="attribute">orderNo</tag> attribute values for each <tag
class="starttag">item</tag> element. The parser does not yet
show these attribute keys and their corresponding values. Read
the documentation for <classname
xlink:href="http://java.sun.com/j2se/1.5.0/docs/api/org/xml/sax/Attributes.html">org.xml.sax.Attributes</classname>
and extend the given code to use it.</para>
<para>You should start from the <xref linkend="glo_MIB"/>
Maven archetype <code>mi-maven-archetype-sax</code>.
Configuration hints are available at <uri
xlink:href="http://www.mi.hdm-stuttgart.de/freedocs/topic/de.hdm_stuttgart.mi.swd1/sw1Resources.html">http://www.mi.hdm-stuttgart.de/freedocs/topic/de.hdm_stuttgart.mi.swd1/sw1Resources.html</uri>.</para>
</question>
<answer>
<para>For the given example it would suffice to read the known
<tag class="attribute">orderNo</tag> attributes value. A
generic solution may ask for the set of all defined attributes
and show their values:</para>
<programlisting language="none">package sax;
public class AttribEventHandler extends DefaultHandler {
public void startElement(String namespaceUri, String localName,
String rawName, Attributes attrs) {
System.out.println("Opening Element " + rawName);
for (int i = 0; i < attrs.getLength(); i++){
System.out.println(attrs.getQName(i) + "=" + attrs.getValue(i) + "\n");
}
}
}</programlisting>
</answer>
</qandaentry>
</qandadiv>
</qandaset>
<section xml:id="sda1SecElementLists">
<title>The set of element names</title>
<qandaset defaultlabel="qanda" xml:id="sda1QandaElementNames">
<title>Element lists of arbitrary XML documents.</title>
<qandadiv>
<qandaentry>
<question>
<para>We reconsider the simple application reading arbitrary
XML documents and providing a list of XML Elements being
contained within:</para>
<programlisting language="none">Opening Document
<emphasis role="bold">Opening "catalog"</emphasis>
Content "
"
<emphasis role="bold">Opening "item"</emphasis>
Content "Swinging headset"
Closing "item"
Content " ...</programlisting>
<para>If an element like e.g. <tag
class="starttag">item</tag> appears multiple times it will
also be written to standard output multiple times.</para>
<para>We are now interested to get the list of all elements
names being present in an arbitrary XML document. Consider
the following example:</para>
<programlisting language="none"><memo>
<from>
<name>Martin</name>
<surname>Goik</surname>
</from>
<to>
<name>Adam</name>
<surname>Hacker</surname>
</to>
<to>
<name>Eve</name>
<surname>Intruder</surname>
</to>
<date year="2005" month="1" day="6"/>
<subject>Firewall problems</subject>
<content>
<para>Thanks for your excellent work.</para>
<para>Our firewall is definitely broken!</para>
</content>
</memo></programlisting>
<para>The elements <tag class="starttag">to</tag> , <tag
class="starttag">name</tag>, <tag
class="starttag">surname</tag> and <tag
class="starttag">para</tag> both appear multiple times.
Write a SAX application which processes arbitrary XML
documents and creates an alphabetically sorted list of
elements being contained <emphasis role="bold">excluding
duplicates</emphasis>. The intended output for the above
example is:</para>
<programlisting language="none">List of elements: {content date from memo name para subject surname to }</programlisting>
<para>The corresponding handler should be implemented in a
re-usable way. Thus if different XML documents are being
handled in succession the list of elements should be erased
prior to processing the current document. Hints:</para>
<itemizedlist>
<listitem>
<para>Use a <classname>java.util.SortedSet</classname>
instance to collect element names thereby excluding
duplicates.</para>
</listitem>
<listitem>
<para>The method
<methodname>sax.count.ListTagNamesHandler.startDocument()</methodname>
may be used to initialize your handler.</para>
</listitem>
</itemizedlist>
</question>
<answer>
<para>A suitable handler reads:</para>
<programlisting language="none">package sax.count;
import java.util.SortedSet;
import java.util.TreeSet;
import org.xml.sax.Attributes;
import org.xml.sax.SAXException;
import org.xml.sax.helpers.DefaultHandler;
/** Reading attributes from element events */
public class ListTagNamesHandler extends DefaultHandler {
// A SortedSet by definition does not contain any duplicates.
private SortedSet<String> elementNames = new TreeSet<>();
@Override
public void startDocument() throws SAXException {
elementNames.clear(); // May contain elements from a previous run.
}
public void startElement(String namespaceUri, String localName,
String rawName, Attributes attrs) {
// In case the current element name has already been inserted
// this method call will be silently ignored.
elementNames.add(rawName);
}
/**
* @return A sorted list of element names of he currently processed XML
* document without duplicates.
*/
public String[] getTagNames() {
return elementNames.toArray(new String[0]);
}
}</programlisting>
<para>A complete application requires a driver:</para>
<programlisting language="none">package sax.count;
import javax.xml.parsers.SAXParser;
import javax.xml.parsers.SAXParserFactory;
import org.xml.sax.XMLReader;
import sax.stat.v2.MyErrorHandler;
public class Driver {
public static void main(String argv[]) throws Exception {
final SAXParserFactory saxPf = SAXParserFactory.newInstance();
final SAXParser saxParser = saxPf.newSAXParser();
final XMLReader xmlReader = saxParser.getXMLReader();
final ListTagNamesHandler handler = new ListTagNamesHandler();
xmlReader.setContentHandler(handler);
xmlReader.setErrorHandler(new MyErrorHandler());
xmlReader.parse("Input/Xml/Memo/message.xml");
System.out.print("List of elements: {");
for (String elementName : handler.getTagNames()) {
System.out.print(elementName + " ");
}
System.out.println("}");
}
}</programlisting>
</answer>
</qandaentry>
</qandadiv>
</qandaset>
</section>
<section xml:id="sda1SaxView">
<title>A limited view on a given XML document instance</title>
<qandaset defaultlabel="qanda" xml:id="sda1QandamemoView">
<title>A specific view on memo documents</title>
<qandadiv>
<qandaentry>
<question>
<para>We reconsider the following memo instance:</para>
<programlisting language="none"><memo>
<from>
<name>Martin</name>
<surname>Goik</surname>
</from>
<to>
<name>Adam</name>
<surname>Hacker</surname>
</to>
<to>
<name>Eve</name>
<surname>Intruder</surname>
</to>
<date year="2005" month="1" day="6"/>
<subject>Firewall problems</subject>
<content>
<para>Thanks for your excellent work.</para>
<para>Our firewall is definitely broken!</para>
</content>
</memo></programlisting>
<para>Every memo instance does have exactly one sender and
one subject. Write a SAX application to achieve the
following output:</para>
<programlisting language="none">Sender: Martin Goik
Subject: Firewall problems</programlisting>
<para>Hint: The callback implementation of
<methodname>org.xml.sax.helpers.DefaultHandler.characters(char[],int,int)</methodname>
may be used to filter the desired output. You have to limit
its output to <tag class="starttag">from</tag> and <tag
class="starttag">subject</tag> descendant content. Taking
the <tag class="starttag">subject</tag>Firewall problems<tag
class="endtag">subject</tag> element as an example the
corresponding event sequence reads:</para>
<informaltable border="1">
<tr>
<th>Event</th>
<th>Corresponding callback</th>
</tr>
<tr>
<td>...</td>
<td>...</td>
</tr>
<tr>
<td>Opening <tag class="starttag">subject</tag>
element</td>
<td>startElement(...)</td>
</tr>
<tr>
<td>Firewall problems</td>
<td>characters(...)</td>
</tr>
<tr>
<td>Closing <tag class="endtag">subject</tag>
element</td>
<td>endElement(...)</td>
</tr>
<tr>
<td>...</td>
<td>...</td>
</tr>
</informaltable>
<para>Limiting output of our
<methodname>org.xml.sax.helpers.DefaultHandler.characters(char[],int,int)</methodname>
callback method can be achieved by introducing instance
scope boolean variables being set to true or false inside
your
<methodname>org.xml.sax.helpers.DefaultHandler.startElement(String
uri,String localName,String qName,org.xml.sax.Attributes
attributes)</methodname> and
<methodname>org.xml.sax.helpers.DefaultHandler.endElement(String
uri, String localName, String qName)</methodname>
implementations accordingly to keep track of the current
event state.</para>
</question>
<answer>
<programlisting language="none">package sax.view;
...
/** A view on memo documents restricting to sender name an subject. */
public class MemoViewHandler extends DefaultHandler {
// These variables help us to keep track of the current event state spanning
// each startElement(...) -- character(...) -- endElement(...) event sequence
boolean inFromContext = false,
inSubjectContext = false;
public void startElement(String namespaceUri, String localName,
String rawName, Attributes attrs) {
switch(rawName) {
case "from":
inFromContext = true;
System.out.print("Sender: ");
break;
case "subject":
inSubjectContext = true;
System.out.print("Subject: ");
break;
case "surname":
if (inFromContext) {
System.out.print(" "); // Adding additional space between <name> and <surname> content.
}
break;
}
}
@Override
public void endElement(String uri, String localName, String rawName)
throws SAXException {
switch(rawName) {
case "from":
inFromContext = false;
System.out.println();
break;
case "subject":
inSubjectContext = false;
System.out.println();
break;
}
}
@Override
public void characters(char[] ch, int start, int length) throws SAXException {
if (inFromContext || inSubjectContext) {
System.out.print(new String(ch, start, length));
}
}
}</programlisting>
</answer>
</qandaentry>
</qandadiv>
</qandaset>
</section>
</section>
<section xml:id="saxValidate">
<title><acronym xlink:href="http://www.saxproject.org">SAX</acronym>
validation</title>
<para>So far we only parsed well formed document instances. Our
current parser may operate on valid XML instances:</para>
<figure xml:id="saxNotValid">
<title>An invalid XML document.</title>
<programlisting language="none"><xs:element name="catalog">
<xs:complexType>
<xs:sequence>
<xs:element ref="item"/>
</xs:sequence>
</xs:complexType>
</xs:element>
<xs:element name="item">
<xs:complexType mixed="true">
<xs:attribute name="orderNo" type="xs:int" use="required"/>
</xs:complexType>
</xs:element></programlisting>
<programlisting language="none"><catalog>
<item orderNo="3218">Swinging headset</item>
<item orderNo="9921">200W Stereo Amplifier</item> <emphasis
role="bold"><!-- second entry forbidden by schema --></emphasis>
</catalog></programlisting>
<caption>
<para>In contrast to <xref linkend="saxMissItem"/> this document
is well formed. But it is not <emphasis
role="bold">valid</emphasis> with respect to the schema since more
than one <tag class="starttag">item</tag> elements are
present.</para>
</caption>
</figure>
<para>This document instance is well-formed but not valid: Only one
element <tag class="starttag">item</tag> is allowed due to an
ill-defined schema. The parser will not report any error or warning.
In order to enable validation we need to configure our parser:</para>
<programlisting language="none">xmlReader.setFeature("http://xml.org/sax/features/validation", true);</programlisting>
<para>The string <code>http://xml.org/sax/features/validation</code>
serves as a key. Since this is an ordinary string value a parser may
or may not implement it. The <acronym
xlink:href="http://www.saxproject.org">SAX</acronym> standard defines
two exception classes for dealing with feature related errors:</para>
<variablelist>
<varlistentry>
<term><link
xlink:href="http://java.sun.com/j2se/1.5.0/docs/api/org/xml/sax/SAXNotRecognizedException.html">SAXNotRecognizedException</link></term>
<listitem>
<para>The feature is not known to the parser.</para>
</listitem>
</varlistentry>
<varlistentry>
<term><link
xlink:href="http://java.sun.com/j2se/1.5.0/docs/api/org/xml/sax/SAXNotSupportedException.html">SAXNotSupportedException</link></term>
<listitem>
<para>The feature is known to the parser but the parser does not
support it or it does not support a specific value being set as
a value.</para>
</listitem>
</varlistentry>
</variablelist>
<para>The <productname
xlink:href="http://projects.apache.org/projects/xml_commons_resolver.html">xml-commons
resolver project </productname>offers an implementation being able to
process various catalog file formats. Maven based project allow the
corresponding library import by adding the following
dependency:</para>
<programlisting language="none"><dependency>
<groupId>xml-resolver</groupId>
<artifactId>xml-resolver</artifactId>
<version>1.2</version>
</dependency></programlisting>
<para>We need a properties file <link
xlink:href="http://xerces.apache.org/xml-commons/components/resolver/tips.html">CatalogManager.properties</link>
defining XML catalogs to be used and additional parameters:</para>
<programlisting language="none"># Catalogs are relative to this properties file
relative-catalogs=false
# Catalog list
catalogs=\
/usr/share/eclipse/dropins/oxygenxml.oxygen_14.2/plugins/com.oxygenxml.editor_14.2.0.v2013021115/frameworks/xhtml/dtd/xhtmlcatalog.xml;\
/usr/share/eclipse/dropins/oxygenxml.oxygen_14.2/plugins/com.oxygenxml.editor_14.2.0.v2013021115/frameworks/xhtml11/dtd/xhtmlcatalog.xml
# PUBLIC in favour of SYSTEM
prefer=public</programlisting>
<para>This configuration uses some catalogs from the
<trademark>Oxygen</trademark> <trademark>Eclipse</trademark> plugin.
We may now add a resolver to our SAX application by referencing the
above configuration file <coref linkend="resolverPropertyFile"/> and
registering the resolver to our SAX parser instance <coref
linkend="resolverRegister"/>:</para>
<programlisting language="none">xmlReader = saxParser.getXMLReader();
// Set up resolving PUBLIC identifier
final CatalogManager cm = new CatalogManager("<emphasis role="bold">CatalogManager.properties</emphasis>" <co
xml:id="resolverPropertyFile"/> );
final CatalogResolver resolver = new CatalogResolver(cm);
xmlReader.setEntityResolver(resolver) <co xml:id="resolverRegister"/>;</programlisting>
</section>
<section xml:id="saxNamespace">
<title>Namespaces</title>
<para>In order to make a <acronym
xlink:href="http://www.saxproject.org">SAX</acronym> parser
application namespace aware we have to activate two <acronym
xlink:href="http://www.saxproject.org">SAX</acronym> parsing
features:</para>
<programlisting language="none">xmlReader = saxParser.getXMLReader();
xmlReader.setFeature("http://xml.org/sax/features/namespaces", true);
xmlReader.setFeature("http://xml.org/sax/features/namespace-prefixes", true);</programlisting>
<para>This instructs the parser to pass the namespace's name for each
element. Namespace prefixes like <code>xsl</code> in <tag
class="starttag">xsl:for-each</tag> are also passed and may be used by
an application:</para>
<programlisting language="none">package sax;
...
public class NamespaceEventHandler extends DefaultHandler {
...
public void startElement(String <emphasis role="bold">namespaceUri</emphasis>, String localName,
String rawName, Attributes attrs) {
System.out.println("Opening Element rawName='" + rawName + "'\n"
+ "namespaceUri='" + <emphasis role="bold">namespaceUri</emphasis> + "'\n"
+ "localName='" + localName
+ "'\n--------------------------------------------");
}</programlisting>
<para>As an example we take a XSLT script:</para>
<programlisting language="none"><?xml version="1.0" encoding="utf-8"?>
<xsl:stylesheet version="1.0"
xmlns:xsl='http://www.w3.org/1999/XSL/Transform'
xmlns:fo='http://www.w3.org/1999/XSL/Format'>
<xsl:template match="/">
<fo:block>A block</fo:block>
<HTML/>
</xsl:template>
</xsl:stylesheet></programlisting>
<para>This XSLT script being conceived as a XML document instance
contains elements belonging to two different namespaces namely
<code>http://www.w3.org/1999/XSL/Transform</code> and
<code>http://www.w3.org/1999/XSL/Format</code>. The script also
contains a <quote>raw</quote> <tag audience=""
class="emptytag">HTML</tag> element being introduced only for
demonstration purposes belonging to the default namespace. The result
reads:</para>
<programlisting language="none">Opening Element rawName='xsl:stylesheet'
namespaceUri='http://www.w3.org/1999/XSL/Transform'
localName='stylesheet'
--------------------------------------------
Opening Element rawName='xsl:template'
namespaceUri='http://www.w3.org/1999/XSL/Transform'
localName='template'
--------------------------------------------
Opening Element rawName='fo:block'
namespaceUri='http://www.w3.org/1999/XSL/Format'
localName='block'
--------------------------------------------
Opening Element rawName='HTML'
namespaceUri=''
localName='HTML'</programlisting>
<para>Now the parser tells us to which namespace a given element node
belongs to. A XSLT engine for example uses this information to build
two classes of elements:</para>
<itemizedlist>
<listitem>
<para>Elements belonging to the namespace
<code>http://www.w3.org/1999/XSL/Transform</code> like <tag
class="emptytag">xsl:value-of select="..."</tag> have to be
interpreted as instructions by the processor.</para>
</listitem>
<listitem>
<para>Elements <emphasis role="bold">not</emphasis> belonging to
the namespace <code>http://www.w3.org/1999/XSL/Transform</code>
like <tag class="emptytag">html</tag> or <tag
class="starttag">fo:block</tag> are copied <quote>as is</quote> to
the output.</para>
</listitem>
</itemizedlist>
<qandaset defaultlabel="qanda" xml:id="quandaentry_SqlFromXml">
<title>Generating SQL INSERT statements from XML data</title>
<qandadiv>
<qandaentry>
<question>
<para>Consider the following schema and document instance
example:</para>
<figure xml:id="catalogProductDescriptionsExample">
<title>A sample catalog containing products and
corresponding descriptions.</title>
<programlisting language="none"><xs:element name="catalog">
<xs:complexType>
<xs:sequence>
<xs:element ref="product" minOccurs="0" maxOccurs="unbounded"/>
</xs:sequence>
</xs:complexType>
</xs:element>
<xs:element name="product">
<xs:complexType>
<xs:sequence>
<xs:element name="name" type="xs:string"/>
<xs:element name="description" type="xs:string" minOccurs="0" maxOccurs="unbounded"/>
<xs:element name="age" type="xs:int" minOccurs="0" maxOccurs="1"/>
</xs:sequence>
<xs:attribute name="id" type="xs:ID" use="required"/>
</xs:complexType>
</xs:element></programlisting>
<programlisting language="none"><catalog ... xsi:noNamespaceSchemaLocation="catalog.xsd">
<product id="mpt">
<name>Monkey Picked Tea</name>
<description>Rare wild Chinese tea</description>
<description>Picked only by specially trained monkeys</description>
</product>
<product id="instantTent">
<name>4-Person Instant Tent</name>
<description>4-person, 1-room tent</description>
<description>Pre-attached tent poles</description>
<description>Exclusive WeatherTec system.</description>
<age>15</age>
</product>
</catalog></programlisting>
</figure>
<para>Data being contained in catalog instances shall be
transferred to a relational database system. Implement and
test a <link linkend="gloss_SAX"><abbrev>SAX</abbrev></link>
application by following the subsequently described
steps:</para>
<glosslist>
<glossentry>
<glossterm>Database schema</glossterm>
<glossdef>
<para>Create a database schema matching a product of
your choice (<productname>Mysql</productname>,
<productname>Oracle</productname>, ...). Your schema
should map type and integrity constraints of the given
DTD. In particular:</para>
<itemizedlist>
<listitem>
<para>The element <tag class="starttag">age</tag> is
optional.</para>
</listitem>
<listitem>
<para><tag class="starttag">description</tag>
elements are children of <product> elements
and should thus be modeled by a 1:n relation.</para>
</listitem>
<listitem>
<para>In a catalog the order of descriptions of a
given product matters. Thus your schema should allow
for descriptions being ordered.</para>
</listitem>
</itemizedlist>
</glossdef>
</glossentry>
<glossentry>
<glossterm>SAX Application</glossterm>
<glossdef>
<para>The order of appearance of the XML elements <tag
class="starttag">product</tag>, <tag
class="starttag">name</tag> and <tag
class="starttag">age</tag> does not permit a linear
generation of suitable SQL <code>INSERT</code>
statements by a <link
linkend="gloss_SAX"><abbrev>SAX</abbrev></link> content
handler. Instead you will have to keep copies of local
element values when implementing
<methodname>org.xml.sax.ContentHandler.startElement(String,String,String,org.xml.sax.Attributes)</methodname>
and related callback methods. The following sequence of
insert statements corresponds to the XML data being
contained in <xref
linkend="catalogProductDescriptionsExample"/>. You may
use these statements as a blueprint to be generated by
your <link
linkend="gloss_SAX"><abbrev>SAX</abbrev></link>
application:</para>
<programlisting language="none"><emphasis role="bold">INSERT INTO Product VALUES ('mpt', 'Monkey picked tea', NULL);</emphasis>
INSERT INTO Description VALUES('mpt', 0, 'Picked only by specially trained monkeys');
INSERT INTO Description VALUES('mpt', 1, 'Rare wild Chinese tea');
<emphasis role="bold">INSERT INTO Product VALUES ('instantTent', '4-person instant tent', 15);</emphasis>
INSERT INTO Description VALUES('instantTent', 0, 'Exclusive WeatherTec system.');
INSERT INTO Description VALUES('instantTent', 1, '4-person, 1-room tent');
INSERT INTO Description VALUES('instantTent', 2, 'Pre-attached tent poles');</programlisting>
<para>Provide a suitable <xref linkend="glo_Junit"/>
test.</para>
</glossdef>
</glossentry>
</glosslist>
</question>
<answer>
<annotation role="make">
<para role="eclipse">P/catalog2sql</para>
</annotation>
<para>Running this project and executing tests requires the
following Maven project dependency to be installed (e.g.
locally via <command>mvn</command> <option>install</option>)
to satisfy a dependency:</para>
<annotation role="make">
<para role="eclipse">P/saxerrorhandler</para>
</annotation>
<para>Some remarks are in order here:</para>
<orderedlist>
<listitem>
<para>The <xref linkend="glo_SQL"/> database schema might
read:</para>
<programlisting language="sql">CREATE TABLE Product (
id CHAR(20) NOT NULL PRIMARY KEY <co linkends="catalog2sqlSchema-1"
xml:id="catalog2sqlSchema-1-co"/>
,name VARCHAR(255) NOT NULL
,age SMALLINT <co linkends="catalog2sqlSchema-2"
xml:id="catalog2sqlSchema-2-co"/>
);
CREATE TABLE Description (
product CHAR(20) NOT NULL REFERENCES Product <co
linkends="catalog2sqlSchema-3"
xml:id="catalog2sqlSchema-3-co"/>
,orderIndex int NOT NULL <co linkends="catalog2sqlSchema-4"
xml:id="catalog2sqlSchema-4-co"/> -- preserving the order of descriptions belonging to a given product
,text VARCHAR(255) NOT NULL
,UNIQUE(product, orderIndex) <co linkends="catalog2sqlSchema-5"
xml:id="catalog2sqlSchema-5-co"/>
);</programlisting>
<calloutlist>
<callout arearefs="catalog2sqlSchema-1-co"
xml:id="catalog2sqlSchema-1">
<para>The primary key constraint implements the
uniqueness of <tag class="starttag">product
id='xyz'</tag> values</para>
</callout>
<callout arearefs="catalog2sqlSchema-2-co"
xml:id="catalog2sqlSchema-2">
<para>Nullability of <code>age</code> implements <tag
class="starttag">age</tag> elements being
optional.</para>
</callout>
<callout arearefs="catalog2sqlSchema-3-co"
xml:id="catalog2sqlSchema-3">
<para><tag class="starttag">description</tag> elements
being children of <tag class="starttag">product</tag>
are being implemented by a foreign key to its
identifying owner thus forming weak entities.</para>
</callout>
<callout arearefs="catalog2sqlSchema-4-co"
xml:id="catalog2sqlSchema-4">
<para>The attribute <code>orderIndex</code> allows
descriptions to be sorted thus maintaining the
original order of appearance of <tag
class="starttag">description</tag> elements.</para>
</callout>
<callout arearefs="catalog2sqlSchema-5-co"
xml:id="catalog2sqlSchema-5">
<para>The <code>orderIndex</code> attribute is unique
within the set of descriptions belonging to the same
product.</para>
</callout>
</calloutlist>
</listitem>
<listitem>
<para>The result of the given input XML sample file should
be similar to the content of the supplied reference file
<filename>products.reference.xml</filename>:</para>
<programlisting language="sql">INSERT INTO Product (id, name) VALUES ('mpt', 'Monkey Picked Tea');
INSERT INTO Description VALUES('mpt', 0, 'Rare wild Chinese tea');
INSERT INTO Description VALUES('mpt', 1, 'Picked only by specially trained monkeys');
-- end of current product entry --
INSERT INTO Product VALUES ('instantTent', '4-Person Instant Tent', 15);
INSERT INTO Description VALUES('instantTent', 0, '4-person, 1-room tent');
INSERT INTO Description VALUES('instantTent', 1, 'Pre-attached tent poles');
INSERT INTO Description VALUES('instantTent', 2, 'Exclusive WeatherTec system.');
-- end of current product entry --</programlisting>
<para>So a <xref linkend="glo_Junit"/> test may just
execute the XML to SQL converter and then compare the
effective output to the above reference file.</para>
</listitem>
</orderedlist>
</answer>
</qandaentry>
</qandadiv>
</qandaset>
<qandaset defaultlabel="qanda" xml:id="quandaentry_NumElemByNs">
<title>Counting element names grouped by namespaces</title>
<qandadiv>
<qandaentry>
<question>
<para>We want to extend the SAX examples counting <link
linkend="saxElementCount">elements</link> and <link
linkend="exercise_saxAttrib">attributes</link> of arbitrary
document instances. Consider the following <link
linkend="gloss_XHTML">XHTML</link> + <link
linkend="gloss_SVG">SVG</link> + <link
linkend="gloss_MathML">MathML</link> sample document:</para>
<programlisting language="none"><?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:xs="http://www.w3.org/2001/XMLSchema" <co
xml:id="xhtmlCombinedNs_Svg"/>
xmlns:h="http://www.w3.org/1999/xhtml" <co xml:id="xhtmlCombinedNs_Xhtml"/>
exclude-result-prefixes="xs" version="2.0">
<xsl:template match="/">
<h:html>
<h:head>
<h:title></h:title>
</h:head>
<h:body>
<h:h1>A heading</h:h1>
<h:p>A paragraph</h:p>
<h:h1>Yet another heading</h:h1>
<xsl:apply-templates/>
</h:body>
</h:html>
</xsl:template>
<xsl:template match="*">
<xsl:message>
<xsl:text>No template defined for element '</xsl:text>
<xsl:value-of select="name(.)"/>
<xsl:text>'</xsl:text>
</xsl:message>
</xsl:template>
</xsl:stylesheet></programlisting>
<para>This XSL stylesheet defines two different namespaces
<coref linkend="xhtmlCombinedNs_Xhtml"/> and <coref
linkend="xhtmlCombinedNs_Xhtml"/>.</para>
<para>Implement a <link linkend="gloss_SAX">SAX</link>
application being able to group elements from arbitrary XML
documents by namespaces along with their corresponding
frequencies of occurrence. The intended output for the
previous <xref linkend="glo_XSL"/> example shall look
like:</para>
<programlisting language="none">Namespace '<emphasis
role="bold">http://www.w3.org/1999/xhtml</emphasis>' contains:
<head> (1 occurrence)
<p> (1 occurrence)
<h1> (2 occurrences)
<html> (1 occurrence)
<title> (1 occurrence)
<body> (1 occurrence)
Namespace '<emphasis role="bold">http://www.w3.org/1999/XSL/Transform</emphasis>' contains:
<stylesheet> (1 occurrence)
<template> (2 occurrences)
<value-of> (1 occurrence)
<apply-templates> (1 occurrence)
<text> (2 occurrences)
<message> (1 occurrence)</programlisting>
<para>Hint: Counting frequencies and grouping by namespaces
may be achieved by using standard Java container
implementations of <classname>java.util.Map</classname>. You
may for example define sets of related XML elements and group
them by their corresponding namespaces. Thus nested maps are
being required.</para>
</question>
<answer>
<annotation role="make">
<para role="eclipse">P/catalog2sql</para>
</annotation>
<para>Running this project and executing tests requires the
following Maven project dependency to be installed (e.g.
locally via <command>mvn</command> <option>install</option>)
to satisfy a dependency:</para>
<annotation role="make">
<para role="eclipse">P/saxerrorhandler</para>
</annotation>
<para>The above solution contains both a running application
and a (incomplete) <xref linkend="glo_Junit"/> test.</para>
</answer>
</qandaentry>
</qandadiv>
</qandaset>
</section>
</section>
<section xml:id="dom">
<title>The Document Object Model (<acronym
xlink:href="http://www.w3.org/DOM">DOM</acronym>)</title>
<titleabbrev><acronym
xlink:href="http://www.w3.org/DOM">DOM</acronym></titleabbrev>
<section xml:id="domBase">
<title>Language independent specification</title>
<titleabbrev>Language independence</titleabbrev>
<para>XML documents allow for automated content processing. We already
discussed the <acronym
xlink:href="http://www.saxproject.org">SAX</acronym> API to access XML
documents by <link
linkend="gloss_Java"><trademark>Java</trademark></link> applications.
There are however situations where <acronym
xlink:href="http://www.saxproject.org">SAX</acronym> is not
appropriate:</para>
<itemizedlist>
<listitem>
<para>The <acronym
xlink:href="http://www.saxproject.org">SAX</acronym> is event
based. XML node elements are passed to handler methods. Sometimes
we want to access neighbouring nodes from a context node in our
handler methods for example a <tag class="starttag">title</tag>
following a <tag class="starttag">chapter</tag> node. <acronym
xlink:href="http://www.saxproject.org">SAX</acronym> does not
offer any support for this. If we need references to neighbouring
nodes we have to create them ourselves during a <acronym
xlink:href="http://www.saxproject.org">SAX</acronym> parsing run.
This is tedious and leads to code being hard to understand.</para>
</listitem>
<listitem>
<para>Some applications may want to select node sets by <acronym
xlink:href="http://www.w3.org/TR/xpath">XPath</acronym>
expressions which is completely impossible in a <acronym
xlink:href="http://www.saxproject.org">SAX</acronym>
application.</para>
</listitem>
<listitem>
<para>We may want to move subtrees within a document itself (for
example exchanging two <tag class="starttag">chapter</tag> nodes)
or even transferring them to a different document.</para>
</listitem>
</itemizedlist>
<para>The greatest deficiency of the <acronym
xlink:href="http://www.saxproject.org">SAX</acronym> is the fact that
an XML instance is not represented as a tree like structure but as a
succession of events. The <acronym
xlink:href="http://www.w3.org/DOM">DOM</acronym> allows us to
represent XML document instances as tree like structures and thus
enables navigational operations between nodes.</para>
<para>In order to achieve language <emphasis>and</emphasis> software
vendor independence the <acronym
xlink:href="http://www.w3.org/DOM">DOM</acronym> approach uses two
stages:</para>
<itemizedlist>
<listitem>
<para>The <acronym
xlink:href="http://www.w3.org/DOM">DOM</acronym> is formulated in
an Interface Definition Language (<abbrev
xlink:href="http://en.wikipedia.org/wiki/Interface_description_language">IDL</abbrev>)</para>
</listitem>
<listitem>
<para>In order to use the <acronym
xlink:href="http://www.w3.org/DOM">DOM</acronym> API by a concrete
programming language a so called <emphasis>language
binding</emphasis> is required. In languages like <link
linkend="gloss_Java"><trademark>Java</trademark></link> the
language binding will still be a set of (<link
linkend="gloss_Java"><trademark>Java</trademark></link>)
interfaces. Thus for actually coding an application an
implementation of these interfaces is needed</para>
</listitem>
</itemizedlist>
<para>So what exactly may an <abbrev
xlink:href="http://en.wikipedia.org/wiki/Interface_description_language">IDL</abbrev>
be? The programming language <link
linkend="gloss_Java"><trademark>Java</trademark></link> already allows
pure interface definitions without any implementation. In C++ the same
result can be achieved by so called <emphasis>pure virtual
classes</emphasis>. An <abbrev
xlink:href="http://en.wikipedia.org/wiki/Interface_description_language">IDL</abbrev>
offers extended features to describe such interfaces. For <acronym
xlink:href="http://www.w3.org/DOM">DOM</acronym> the <productname
xlink:href="http://www.omg.org/gettingstarted/corbafaq.htm">CORBA
2.2</productname> <abbrev
xlink:href="http://en.wikipedia.org/wiki/Interface_description_language">IDL</abbrev>
had been chosen to describe an XML document programming interface. As
a first example we take an excerpt from the <acronym
xlink:href="http://www.w3.org/DOM">DOM</acronym>'s <link
xlink:href="http://www.w3.org/TR/DOM-Level-3-Core/core.html#ID-1950641247">Node</link>
interface definition:</para>
<programlisting language="none">interface Node {
// NodeType
const unsigned short ELEMENT_NODE = 1;
const unsigned short ATTRIBUTE_NODE = 2;
const unsigned short TEXT_NODE = 3;
...
readonly attribute DOMString nodeName;
attribute DOMString nodeValue;
// raises(DOMException) on setting
// raises(DOMException) on retrieval
readonly attribute unsigned short nodeType;
readonly attribute Node parentNode;
...
readonly attribute NodeList childNodes;
readonly attribute Node firstChild;
...
Node insertBefore(in Node newChild,
in Node refChild)
raises(DOMException);
...</programlisting>
<para>If we want to implement the <abbrev
xlink:href="http://en.wikipedia.org/wiki/Interface_description_language">IDL</abbrev>
<classname>org.w3c.dom.Node</classname> specification in e.g. <link
linkend="gloss_Java"><trademark>Java</trademark></link> a language
binding has to be defined. This means writing <link
linkend="gloss_Java"><trademark>Java</trademark></link> code which
closely resembles the <abbrev
xlink:href="http://en.wikipedia.org/wiki/Interface_description_language">IDL</abbrev>
specification. Obviously this task depends on and is restricted by the
constructs being offered by the target programming language. The W3C
<link
xlink:href="http://www.w3.org/TR/DOM-Level-3-Core/java-binding.html">defines</link>
the <link linkend="gloss_Java"><trademark>Java</trademark></link>
<classname>org.w3c.dom.Node</classname> interface by:</para>
<programlisting language="none">package org.w3c.dom;
public interface Node {
public static final short ELEMENT_NODE = 1; // Node Types
public static final short ATTRIBUTE_NODE = 2;
public static final short TEXT_NODE = 3;
...
public String getNodeName();
public String getNodeValue() throws DOMException;
public void setNodeValue(String nodeValue) throws DOMException;
public short getNodeType();
public Node getParentNode();
public NodeList getChildNodes();
public Node getFirstChild();
...
public Node insertBefore(Node newChild,
Node refChild)
throws DOMException;
...
}</programlisting>
<para>We take
<methodname>org.w3c.dom.Node.getChildNodes()</methodname> as an
example:</para>
<figure xml:id="domRetrieveChildren">
<title>Retrieving child nodes of a given context node</title>
<mediaobject>
<imageobject>
<imagedata fileref="Ref/Fig/domtree.fig" scale="65"/>
</imageobject>
</mediaobject>
</figure>
<para>The <classname>org.w3c.dom.Node</classname> interface offers a
set of common operations for objects being part of a XML document. But
a XML document tree contains different types of nodes such as:</para>
<itemizedlist>
<listitem>
<para>Elements</para>
</listitem>
<listitem>
<para>Attributes</para>
</listitem>
<listitem>
<para>Entities</para>
</listitem>
</itemizedlist>
<para>An XML API may address this issue by offering data types to
represent these different kinds of nodes. The <acronym
xlink:href="http://www.w3.org/DOM">DOM</acronym> <link
linkend="gloss_Java"><trademark>Java</trademark></link> Binding
defines an inheritance hierarchy of interfaces for this
purpose:</para>
<figure xml:id="domJavaNodeInterfaces">
<title>Inheritance interface hierarchy in the <acronym
xlink:href="http://www.w3.org/DOM">DOM</acronym> <link
linkend="gloss_Java"><trademark>Java</trademark></link>
binding</title>
<mediaobject>
<imageobject>
<imagedata fileref="Ref/Fig/nodeHierarchy.svg"/>
</imageobject>
</mediaobject>
</figure>
<para>Two commonly used <link
linkend="gloss_Java"><trademark>Java</trademark></link>
implementations of these interfaces are:</para>
<variablelist>
<varlistentry>
<term>Xerces</term>
<listitem>
<para><orgname
xlink:href="http://xml.apache.org/xerces2-j">Apache Software
foundation</orgname></para>
</listitem>
</varlistentry>
<varlistentry>
<term>Jaxp</term>
<listitem>
<para><orgname xlink:href="http://java.sun.com/xml/jaxp">Sun
microsystems</orgname></para>
</listitem>
</varlistentry>
</variablelist>
<para>Both implementations offer additional interfaces beyond the
<acronym xlink:href="http://www.w3.org/DOM">DOM</acronym>'s
scope.</para>
<para>Going back to the <acronym
xlink:href="http://www.w3.org/DOM">DOM</acronym> itself the
specification is divided into <link
xlink:href="http://www.w3.org/TR/2004/REC-DOM-Level-3-Core-20040407/introduction.html#DOMArchitecture-h2">modules</link>:</para>
<figure xml:id="figureDomModules">
<title><acronym xlink:href="http://www.w3.org/DOM">DOM</acronym>
modules.</title>
<mediaobject>
<imageobject>
<imagedata fileref="Ref/Screen/dom-architecture.screen.png"/>
</imageobject>
</mediaobject>
</figure>
</section>
<section xml:id="domCreate">
<title>Creating a new document from scratch</title>
<titleabbrev>New document</titleabbrev>
<para>If we want to export non-XML content (e.g. from a RDBMS) into
XML we may achieve this by the following recipe:</para>
<orderedlist>
<listitem>
<para>Create a document builder instance.</para>
</listitem>
<listitem>
<para>Create an empty <link
xlink:href="http://java.sun.com/j2se/1.5.0/docs/api/org/w3c/dom/Document.html">Document</link>
instance.</para>
</listitem>
<listitem>
<para>Fill in the desired Elements and Attributes.</para>
</listitem>
<listitem>
<para>Create a serializer.</para>
</listitem>
<listitem>
<para>Serialize the resulting tree to a stream.</para>
</listitem>
</orderedlist>
<para>An introductory piece of code illustrates these steps:</para>
<figure xml:id="simpleDomCreate">
<title>Creation of a XML document instance from scratch.</title>
<programlisting language="none">package dom;
...
public class CreateDoc {
public static void main(String[] args) throws Exception {
// Create the root element
<emphasis role="bold">final Element titel = new Element("titel");
</emphasis>
//Set a date
<emphasis role="bold">titel.setAttribute("date", "23.02.2000");</emphasis>
// Append a text node as child
<emphasis role="bold">titel.addContent(new Text("Versuch 1"));</emphasis>
// Set formatting for the XML output
<emphasis role="bold">final Format outFormat = Format.getPrettyFormat();</emphasis>
// Serialize to console
<emphasis role="bold">final XMLOutputter printer = new XMLOutputter(outFormat);
printer.output(titel, System.out);</emphasis>
}
}</programlisting>
</figure>
<para>We get the following result:</para>
<programlisting language="none"><?xml version="1.0" encoding="UTF-8"?>
<titel date="23.02.2000">Versuch 1</titel></programlisting>
</section>
<section xml:id="domCreateExercises">
<title>Exercises</title>
<qandaset defaultlabel="qanda" xml:id="createDocModify">
<title>A sub structured <tag class="starttag">title</tag></title>
<qandadiv>
<qandaentry>
<question>
<label>Creation of an extended XML document instance</label>
<para>In order to run the examples given during the lecture
the <filename
xlink:href="http://www.jdom.org/downloads">jdom2.jar</filename>
library must be added to the <envar>CLASSPATH</envar>.</para>
<para>The <acronym
xlink:href="http://www.w3.org/DOM">DOM</acronym> creating
example given before may be used as a starting point. Extend
the <acronym xlink:href="http://www.w3.org/DOM">DOM</acronym>
tree created in <xref linkend="simpleDomCreate"/> to produce
an extended XML document:</para>
<programlisting language="none"><title>
<long>The long version of this title</long>
<short>Short version</short>
</title></programlisting>
</question>
<answer>
<programlisting language="none">package dom;
...
public class CreateExtended {
/**
* @param args
* @throws IOException
*/
public static void main(String[] args) throws IOException {
final Element titel = new Element("titel"),
tLong = new Element("long"),
tShort = new Element("short");
<emphasis role="bold">// Append <long> and <short> to parent <title></emphasis>
titel.addContent(tLong).addContent(tShort);
<emphasis role="bold">// Append text to <long> and <short></emphasis>
tLong.addContent(new Text("The long version of this title"));
tShort.addContent(new Text("Short version"));
<emphasis role="bold">// Set formatting for the XML output</emphasis>
Format outFormat = Format.getPrettyFormat();
<emphasis role="bold">// Serialize to console</emphasis>
final XMLOutputter printer = new XMLOutputter(outFormat);
printer.output(titel, System.out);
}
}</programlisting>
</answer>
</qandaentry>
</qandadiv>
</qandaset>
</section>
<section xml:id="domParse">
<title>Parsing existing XML documents</title>
<titleabbrev>Parsing</titleabbrev>
<para>We already used a <acronym
xlink:href="http://www.saxproject.org">SAX</acronym> to parse an XML
document. Rather than handling <acronym
xlink:href="http://www.saxproject.org">SAX</acronym> events ourselves
these events may be used to construct a <acronym
xlink:href="http://www.w3.org/DOM">DOM</acronym> representation of our
document. This work is done by an instance of. We use our catalog
example from <xref linkend="simpleCatalog"/> as an introductory
example.</para>
<para>We already noticed the need for an
<classname>org.xml.sax.ErrorHandler</classname> object during <acronym
xlink:href="http://www.saxproject.org">SAX</acronym> processing. A
<acronym xlink:href="http://www.w3.org/DOM">DOM</acronym> Parser
requires a similar type of Object in order to react to parsing errors
in a meaningful way. In principle a <acronym
xlink:href="http://www.w3.org/DOM">DOM</acronym> parser implementor is
free to choose his implementation but most implementations are based
on top of a <acronym
xlink:href="http://www.saxproject.org">SAX</acronym> parser. For this
reason it was natural to choose a <acronym
xlink:href="http://www.w3.org/DOM">DOM</acronym> error handling
interface which is similar to a <acronym
xlink:href="http://www.saxproject.org">SAX</acronym>
<classname>org.xml.sax.ErrorHandler</classname>. The following code
serves the needs described before:</para>
<figure xml:id="domTreeTraversal">
<title>Accessing a XML Tree purely by <acronym
xlink:href="http://www.w3.org/DOM">DOM</acronym> methods.</title>
<programlisting language="none">package dom;
...
public class ArticleOrder {
<emphasis role="bold"> // Though we are playing DOM here, a <acronym
xlink:href="http://www.saxproject.org">SAX</acronym> parser still
// assembles our DOM tree.</emphasis>
private SAXBuilder builder = new SAXBuilder();
public ArticleOrder() {
<emphasis role="bold">// Though an ErrorHandler is not strictly required it allows
// for easierlocalization of XML document errors</emphasis>
builder.setErrorHandler(new MySaxErrorHandler(System.out));<co
linkends="domSetSaxErrorHandler-co"
xml:id="domSetSaxErrorHandler"/>
}
/** Descending a catalog till its <item> elements. For each product
* its name and order number are being written to the output.
* @throws ...
*/
public void process(final String filename) throws JDOMException, IOException {
<emphasis role="bold">// Parsing our XML file</emphasis>
final Document docInput = builder.build(filename);
<emphasis role="bold">// Accessing the document's root element</emphasis>
final Element docRoot = docInput.getRootElement();
<emphasis role="bold">// Accessing the <item> children of parent element <catalog></emphasis>
final List<Element> items = docRoot.getChildren(); // Element nodes only
for (final Element item : items) {
System.out.println("Article: " + item.getText()
+ ", order number: " + item.getAttributeValue("orderNo"));
} ...</programlisting>
<para>Note <coref linkend="domSetSaxErrorHandler"
xml:id="domSetSaxErrorHandler-co"/>: This is our standard <acronym
xlink:href="http://www.saxproject.org">SAX</acronym> error handler
implementing the <classname>org.xml.sax.ErrorHandler</classname>
interface.</para>
</figure>
<para>Executing this method needs a driver instance providing an input
XML filename:</para>
<programlisting language="none">package dom;
...
public class ArticleOrderDriver {
public static void main(String[] argv) throws Exception {
final ArticleOrder ao = new ArticleOrder();
ao.process("<emphasis role="bold">Input/article.xml</emphasis>");
}
}</programlisting>
<para>This yields:</para>
<programlisting language="none">Article: Swinging headset, order number: 3218
Article: 200W Stereo Amplifier, order number: 9921</programlisting>
<para>To illustrate the internal processes we take a look at the
sequence diagram:</para>
<mediaobject>
<imageobject>
<imagedata fileref="Ref/Fig/sequenceDomParser.svg"/>
</imageobject>
</mediaobject>
<qandaset defaultlabel="qanda" xml:id="exercise_domHtmlSimple">
<title>Creating HTML output</title>
<qandadiv>
<qandaentry>
<question>
<label>Simple HTML output</label>
<para>Instead exporting simple text output in <xref
linkend="domTreeTraversal"/> we may also create HTML pages
like:</para>
<programlisting language="none"><!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<html>
<head>
<title>Available articles</title>
</head>
<body>
<h1>Available articles</h1>
<table>
<tbody>
<tr>
<th align="left">Article Description</th><th>Order Number</th>
</tr>
<tr>
<td align="left"><emphasis role="bold">Swinging headset</emphasis></td><td><emphasis
role="bold">3218</emphasis></td>
</tr>
<tr>
<td align="left"><emphasis role="bold">200W Stereo Amplifier</emphasis></td><td><emphasis
role="bold">9921</emphasis></td>
</tr>
</tbody>
</table>
</body>
</html></programlisting>
<para>Instead of simply writing
<code>...println(<html>\n\t<head>...)</code>
statements you are expected to code a more sophisticated
solution. We may combine<xref linkend="createDocModify"/> and
<xref linkend="createDocModify"/>. The idea is reading the XML
catalog instance as a <acronym
xlink:href="http://www.w3.org/DOM">DOM</acronym> as before.
Then construct a <emphasis>second</emphasis> <acronym
xlink:href="http://www.w3.org/DOM">DOM</acronym> tree for the
desired HTML output and fill in the article information from
the first <acronym
xlink:href="http://www.w3.org/DOM">DOM</acronym> tree
accordingly.</para>
</question>
<answer>
<para>We introduce a class
<classname>solve.dom.HtmlTree</classname>:</para>
<programlisting language="none">package solve.dom;
...
package solve.dom;
import java.io.IOException;
import java.io.PrintStream;
import org.jdom2.DocType;
import org.jdom2.Document;
import org.jdom2.Element;
import org.jdom2.Text;
import org.jdom2.output.Format;
import org.jdom2.output.XMLOutputter;
/**
* Holding a HTML DOM to produce output.
* @author goik
*/
public class HtmlTree {
private Document htmlOutput;
private Element tableBody;
public HtmlTree(final String titleText,
final String[] tableHeaderFields) { <co
linkends="programlisting_catalog2html_htmlskel_co"
xml:id="programlisting_catalog2html_htmlskel"/>
DocType doctype = new DocType("html",
"-//W3C//DTD XHTML 1.0 Strict//EN",
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd");
final Element htmlRoot = new Element("html"); <co
linkends="programlisting_catalog2html_tablehead_co"
xml:id="programlisting_catalog2html_tablehead"/>
htmlOutput = new Document(htmlRoot);
htmlOutput.setDocType(doctype);
// We create a HTML skeleton including an "empty" table
final Element head = new Element("head"),
body = new Element("body"),
table = new Element("table");
htmlRoot.addContent(head).addContent(body);
head.addContent(new Element("title").addContent(new Text(titleText)));
body.addContent(new Element("h1").addContent(new Text(titleText)));
body.addContent(table);
tableBody = new Element("tbody");
table.addContent(tableBody);
final Element tr = tableBody.addContent(new Element("tr"));
for (final String headerField: tableHeaderFields) {
tr.addContent(new Element("th").addContent(new Text(headerField)));
}
}
public void appendItem(final String itemName, final String orderNo) {<co
linkends="programlisting_catalog2html_insertproduct_co"
xml:id="programlisting_catalog2html_insertproduct"/>
final Element tr = new Element("tr");
tableBody.addContent(tr);
tr.addContent(new Element("td").addContent(new Text(itemName)));
tr.addContent(new Element("td").addContent(new Text(orderNo)));
}
public void serialize(PrintStream out){
// Set formatting for the XML output
final Format outFormat = Format.getPrettyFormat();
// Serialize to console
final XMLOutputter printer = new XMLOutputter(outFormat);
try {
printer.output(htmlOutput, System.out);
} catch (IOException e) {
e.printStackTrace();
System.exit(1);
}
}
/**
* @return the table's <tbody> element
*/
public Element getTable() {
return tableBody;
}
}
</programlisting>
<calloutlist>
<callout arearefs="programlisting_catalog2html_htmlskel"
xml:id="programlisting_catalog2html_htmlskel_co">
<para>A basic HTML skeleton is is being created:</para>
<programlisting language="none"><?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<title>Available articles</title>
</head>
<body>
<h1>Available articles</h1>
<table>
<emphasis role="bold"><tbody></emphasis> <!-- Data to be inserted here in next step -->
<emphasis role="bold"></tbody></emphasis>
</table>
</body>
</html></programlisting>
<para>The table containing the product's data is empty at
this point and thus invalid.</para>
</callout>
<callout arearefs="programlisting_catalog2html_tablehead"
xml:id="programlisting_catalog2html_tablehead_co">
<para>The table's header is appended but the actual data
from our two products is still missing:</para>
<programlisting language="none">... <h1>Available articles</h1>
<table>
<tbody>
<tr>
<th>Article Description</th>
<th>Order Number</th>
<emphasis role="bold"></tr></emphasis><!-- Data to be appended after this row in next step -->
<emphasis role="bold"></tbody></emphasis>
</table> ...</programlisting>
</callout>
<callout arearefs="programlisting_catalog2html_insertproduct"
xml:id="programlisting_catalog2html_insertproduct_co">
<para>Calling
<methodname>solve.dom.HtmlTree.appendItem(String,String)</methodname>
once per product completes the creation of our HTML DOM
tree:</para>
<programlisting language="none">... </tr>
<tr>
<td>Swinging headset</td>
<td>3218</td>
</tr>
<tr>
<td>200W Stereo Amplifier</td>
<td>9921</td>
</tr>
</tbody> ...</programlisting>
</callout>
</calloutlist>
<para>The class <classname>solve.dom.Article2Html</classname>
reads the catalog data:</para>
<programlisting language="none">package solve.dom;
...
public class Article2Html {
private final SAXBuilder builder = new SAXBuilder();
private final HtmlTree htmlResult;
public Article2Html() {
builder.setErrorHandler(new MySaxErrorHandler(System.out));
htmlResult = new HtmlTree("Available articles", new String[] { <co
linkends="programlisting_catalog2html_glue_createhtmldom_co"
xml:id="programlisting_catalog2html_glue_createhtmldom"/>
"Article Description", "Order Number" });
}
/** Read an Xml catalog instance and insert product names among with their
* order numbers into the HTML DOM. Then serialize HTML tree to a stream.
*
* @param
* filename of the Xml source.
* @param out
* The output stream for HTML serialization.
* @throws IOException
* @throws JDOMException
*/
public void process(final String filename, final PrintStream out) throws JDOMException, IOException{
final List<Element> items =
builder.build(filename).getRootElement().getChildren();
for (final Element item : items) { <co
linkends="programlisting_catalog2html_glue_prodloop_co"
xml:id="programlisting_catalog2html_glue_prodloop"/>
htmlResult.appendItem(item.getText(), item.getAttributeValue("orderNo")); <co
linkends="programlisting_catalog2html_glue_insertprod_co"
xml:id="programlisting_catalog2html_glue_insertprod"/>
}
htmlResult.serialize(out); <co
linkends="programlisting_catalog2html_glue_serialize_co"
xml:id="programlisting_catalog2html_glue_serialize"/>
}
}</programlisting>
<calloutlist>
<callout arearefs="programlisting_catalog2html_glue_createhtmldom"
xml:id="programlisting_catalog2html_glue_createhtmldom_co">
<para>Create an instance holding a HTML <acronym
xlink:href="http://www.w3.org/DOM">DOM</acronym> with a
table header containing the strings <emphasis>Article
Description</emphasis> and <emphasis>Order
Number</emphasis>.</para>
</callout>
<callout arearefs="programlisting_catalog2html_glue_prodloop"
xml:id="programlisting_catalog2html_glue_prodloop_co">
<para>Iterate over all product nodes.</para>
</callout>
<callout arearefs="programlisting_catalog2html_glue_insertprod"
xml:id="programlisting_catalog2html_glue_insertprod_co">
<para>Insert the product's name an order number into the
HTML <acronym
xlink:href="http://www.w3.org/DOM">DOM</acronym>.</para>
</callout>
<callout arearefs="programlisting_catalog2html_glue_serialize"
xml:id="programlisting_catalog2html_glue_serialize_co">
<para>Serialize the completed HTML <acronym
xlink:href="http://www.w3.org/DOM">DOM</acronym> tree to
the output stream.</para>
</callout>
</calloutlist>
</answer>
</qandaentry>
</qandadiv>
</qandaset>
</section>
<section xml:id="domJavaScript">
<title>Using <acronym xlink:href="http://www.w3.org/DOM">DOM</acronym>
with HTML/Javascript</title>
<para>Due to script language support in a variety of browsers we may
also use the <acronym xlink:href="http://www.w3.org/DOM">DOM</acronym>
to implement client side event handling. As an example we <link
xlink:href="Ref/src/tablesort.html">demonstrate</link> how a HTML
table can be made sortable by clicking on a header's column. The
example code along with the code description can be found at <uri
xlink:href="http://www.kryogenix.org/code/browser/sorttable">http://www.kryogenix.org/code/browser/sorttable</uri>.</para>
<para>Quite remarkably there are only few ingredients required to
enrich an ordinary static HTML table with this functionality:</para>
<itemizedlist>
<listitem>
<para>An external Javascript library has to be included via
<code><script type="text/javascript"
src="sorttable.js"></code></para>
</listitem>
<listitem>
<para>Each sortable HTML table needs:</para>
<itemizedlist>
<listitem>
<para>A unique <code>id</code> attribute</para>
</listitem>
<listitem>
<para>A <code>class="sortable"</code> attribute</para>
</listitem>
</itemizedlist>
</listitem>
</itemizedlist>
</section>
<section xml:id="domXpath">
<title>Using <acronym
xlink:href="http://www.w3.org/TR/xpath">XPath</acronym></title>
<para><xref linkend="domTreeTraversal"/> demonstrated the possibility
to traverse trees solely by using <acronym
xlink:href="http://www.w3.org/DOM">DOM</acronym> Method calls. Though
this approach is possible it will in general not lead to stable
applications. Real world examples are often based on large XML
documents with complex hierarchical structures. Thus using this rather
primitive approach will foster deeply nested method calls being
necessary to access desired node sets. In addition changing the
conceptional schema will require rewriting large code
portions..</para>
<para>As we already know from <abbrev
xlink:href="http://www.w3.org/Style/XSL">XSL</abbrev> transformations
<code>Xpath</code> allows to address node sets inside a XML tree. The
role of <acronym
xlink:href="http://www.w3.org/TR/xpath">XPath</acronym> can be
compared to SQL queries when working with relational databases.
<acronym xlink:href="http://www.w3.org/TR/xpath">XPath</acronym> may
also be used within <link
linkend="gloss_Java"><trademark>Java</trademark></link> code. As a
first example we show an image filename extracting application
operating on XHTML documents. The following example contains three
<tag class="starttag">img</tag> elements:</para>
<figure xml:id="htmlGallery">
<title>A HTML document containing <code>IMG</code> tags.</title>
<programlisting language="none"><?xml version="1.0"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html>
<head>
<title>Picture gallery</title>
</head>
<body>
<h1>Picture gallery</h1>
<p>Images may appear inline:<emphasis role="bold"><img src="inline.gif" alt="none"/></emphasis></p>
<table>
<tbody>
<tr>
<td>Number one:</td>
<td><emphasis role="bold"><img src="one.gif" alt="none"/></emphasis></td>
</tr>
<tr>
<td>Number two:</td>
<td><emphasis role="bold"><img src="http://www.hdm-stuttgart.de/favicon.ico" alt="none"/></emphasis></td>
</tr>
</tbody>
</table>
</body>
</html>
</programlisting>
</figure>
<para>A given HTML document may contain <tag
class="starttag">img</tag> elements at <emphasis>arbitrary</emphasis>
positions. It is sometimes desirable to check for existence and
accessibility of such external objects being necessary for the page's
correct rendering. A simple XSL script will do first part the job
namely extracting the <tag class="starttag">img</tag> elements:</para>
<figure xml:id="gallery2imagelist">
<title>A <abbrev
xlink:href="http://www.w3.org/Style/XSL">XSL</abbrev> script for
image name extraction.</title>
<programlisting language="none"><xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:html="http://www.w3.org/1999/xhtml">
<xsl:output method="text"/>
<xsl:template match="/">
<xsl:for-each select="//html:img">
<xsl:value-of select="@src"/>
<xsl:text> </xsl:text>
</xsl:for-each>
</xsl:template>
</xsl:stylesheet></programlisting>
</figure>
<para>Note the necessity for <code>html</code> namespace inclusion
into the <acronym
xlink:href="http://www.w3.org/TR/xpath">XPath</acronym> expression in
<code><xsl:for-each select="//html:img"></code>. A simple
<code>select="//img"></code> results in an empty node set.
Executing the <abbrev
xlink:href="http://www.w3.org/Style/XSL">XSL</abbrev> script yields a
list of image filenames being contained in the HTML page i.e.
<code>inline.gif one.gif two.gif</code>.</para>
<para>Now we want to write a <link
linkend="gloss_Java"><trademark>Java</trademark></link> application
which allows to check whether these referenced image files do exist
and have sufficient permissions to be accessed. A simple approach may
pipe the <abbrev xlink:href="http://www.w3.org/Style/XSL">XSL</abbrev>
output to our application which then executes the readability checks.
Instead we want to incorporate the <acronym
xlink:href="http://www.w3.org/TR/xpath">XPath</acronym> based search
into the application. Ignoring Namespaces and trying to resemble the
<abbrev xlink:href="http://www.w3.org/Style/XSL">XSL</abbrev> actions
as closely as possible our application will have to search for <link
xlink:href="http://java.sun.com/j2se/1.5.0/docs/api/org/w3c/dom/Element.html">Element</link>
Nodes by the <acronym
xlink:href="http://www.w3.org/TR/xpath">XPath</acronym> expression
<code>//html:img</code>:</para>
<figure xml:id="domFindImages">
<title>Extracting <tag class="emptytag">img</tag> element image
references from a HTML document.</title>
<programlisting language="none">package dom.xpath;
...
public class DomXpath {
private final SAXBuilder builder = new SAXBuilder();
public DomXpath() {
builder.setErrorHandler(new MySaxErrorHandler(System.err));
}
public void process(final String xhtmlFilename) throws JDOMException, IOException {
final Document htmlInput = builder.build(xhtmlFilename);<co
linkends="programlisting_java_searchimg_parse_co"
xml:id="programlisting_java_searchimg_parse"/>
final XPathExpression<Object> xpath = XPathFactory.instance().compile( "//img" ); <co
linkends="programlisting_java_searchimg_pf_co"
xml:id="programlisting_java_searchimg_pf"/> <co
linkends="programlisting_java_searchimg_newxpath_co"
xml:id="programlisting_java_searchimg_newxpath"/>
final List<Object> images = xpath.evaluate(htmlInput);<co
linkends="programlisting_java_searchimg_execquery_co"
xml:id="programlisting_java_searchimg_execquery"/>
for (Object o: images) { <co
linkends="programlisting_java_searchimg_loop_co"
xml:id="programlisting_java_searchimg_loop"/>
final Element image = (Element ) o;<co
linkends="programlisting_java_searchimg_cast_co"
xml:id="programlisting_java_searchimg_cast"/>
System.out.print(image.getAttribute("src") + " ");
}
}
}</programlisting>
<caption>
<para>This application searches for <tag
class="emptytag">img</tag> elements and shows their
<code>src</code> attribute value.</para>
</caption>
</figure>
<calloutlist>
<callout arearefs="programlisting_java_searchimg_parse"
xml:id="programlisting_java_searchimg_parse_co">
<para>Parse a XHTML document instance into a DOM tree.</para>
</callout>
<callout arearefs="programlisting_java_searchimg_pf"
xml:id="programlisting_java_searchimg_pf_co">
<para>Create a <acronym
xlink:href="http://www.w3.org/TR/xpath">XPath</acronym>
factory.</para>
</callout>
<callout arearefs="programlisting_java_searchimg_newxpath"
xml:id="programlisting_java_searchimg_newxpath_co">
<para>Create a <acronym
xlink:href="http://www.w3.org/TR/xpath">XPath</acronym> query
instance. This may be used to search for a set of nodes starting
from a context node.</para>
</callout>
<callout arearefs="programlisting_java_searchimg_execquery"
xml:id="programlisting_java_searchimg_execquery_co">
<para>Using the document's root node as the context node we search
for <tag class="starttag">img</tag> elements appearing at
arbitrary positions in our document.</para>
</callout>
<callout arearefs="programlisting_java_searchimg_loop"
xml:id="programlisting_java_searchimg_loop_co">
<para>We iterate over the retrieved list of images.</para>
</callout>
<callout arearefs="programlisting_java_searchimg_cast"
xml:id="programlisting_java_searchimg_cast_co">
<para>Casting to the correct type.</para>
</callout>
</calloutlist>
<para>The result is a list of image filename references:</para>
<programlisting language="none">inline.gif one.gif http://www.hdm-stuttgart.de/favicon.ico </programlisting>
<qandaset defaultlabel="qanda" xml:id="quandaentry_CastAlwaysLegal">
<title>Legal casting?</title>
<qandadiv>
<qandaentry>
<question>
<para>Why is the cast in <coref
linkend="programlisting_java_searchimg_cast"/> in <xref
linkend="domFindImages"/> guaranteed to never cause a
<classname>java.lang.ClassCastException</classname>?</para>
</question>
<answer>
<para>The <acronym
xlink:href="http://www.w3.org/TR/xpath">XPath</acronym>
<code>//img</code> expression is guaranteed to return only
<tag class="starttag">img</tag> elements. Thus within our
<link linkend="gloss_Java"><trademark>Java</trademark></link>
context we are sure to find only
<classname>org.jdom2.Element</classname> instances.</para>
</answer>
</qandaentry>
</qandadiv>
</qandaset>
<qandaset defaultlabel="qanda" xml:id="exercise_htmlImageVerify">
<title>Verification of referenced images readability</title>
<qandadiv>
<qandaentry>
<question>
<para>We want to extend the example given in <xref
linkend="domFindImages"/> by testing the existence and
checking for readability of referenced images. The following
HTML document contains <quote>dead</quote> image
references:</para>
<programlisting language="none"
xml:id="domCheckImageAccessibility"><!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml"> ...
<body>
<h1>External Pictures</h1>
<p>A local image reference:<img src="inline.gif" alt="none"/></p>
<table>
<tbody>
<tr>
<td>An existing picture:</td>
<td><img
src="http://www.hdm-stuttgart.de/bilder_navigation/laptop.gif"
alt="none"/></td>
</tr>
<tr>
<td>A non-existing picture:</td>
<td><img src="<emphasis role="bold">http://www.hdm-stuttgart.de/rotfl.gif</emphasis>" alt="none"/></td>
</tr>
</tbody>
</table>
</body>
</html></programlisting>
<para>Write an application which checks for readability of
<abbrev
xlink:href="http://www.ietf.org/rfc/rfc1738.txt">URL</abbrev>
image references to <emphasis>external</emphasis> Servers
starting either with <code>http://</code> or
<code>ftp://</code> ignoring other protocol types. Internal
image references referring to the <quote>current</quote>
server typically look like <code><img
src="/images/test.gif"</code>. So in order to distinguish
these two types of references we may use the XSL built in
function <link
xlink:href="http://www.cafeconleche.org/books/bible2/chapters/ch17.html">starts-with()</link>
testing for the <code>http</code> or <code>ftp</code> protocol
definition part of an <abbrev
xlink:href="http://www.ietf.org/rfc/rfc1738.txt">URL</abbrev>.
A possible output for the example being given is:</para>
<programlisting language="none">Received 'sun.awt.image.URLImageSource' from
http://www.hdm-stuttgart.de/bilder_navigation/laptop.gif
Unable to open 'http://www.hdm-stuttgart.de/rotfl.gif'</programlisting>
<para>The following code snippet shows a helpful class method
to check for both correctness of <abbrev
xlink:href="http://www.ietf.org/rfc/rfc1738.txt">URL</abbrev>'s
and accessibility of referenced objects:</para>
<programlisting language="none">package dom.xpath;
...
public class CheckUrl {
public static void checkReadability(final String urlRef) {
try {
final URL url = new URL(urlRef);
try {
final Object imgCandidate = url.getContent();
if (null == imgCandidate) {
System.err.println("Unable to open '" + urlRef + "'");
} else {
System.out.println("Received '"
+ imgCandidate.getClass().getName() + "' from "
+ urlRef);
}
} catch (IOException e) {
System.err.println("Unable to open '" + urlRef + "'");
}
} catch (MalformedURLException e) {
System.err.println("Adress '" + urlRef + "' is malformed");
}
}
}</programlisting>
</question>
<answer>
<para>We are interested in the set of images within a given
HTML document containing an <link
xlink:href="http://www.w3.org/Addressing">URL</link> reference
starting either with <code>http://</code> or
<code>ftp://</code>. This is achieved by the following
<acronym
xlink:href="http://www.w3.org/TR/xpath">XPath</acronym>
expression:</para>
<programlisting language="none">//html:img[starts-with(@src, 'http://') or starts-with(@src, 'ftp://')]</programlisting>
<para>The application only needs to pass the corresponding
<abbrev
xlink:href="http://www.ietf.org/rfc/rfc1738.txt">URL</abbrev>'s
to the method <link
xlink:href="domCheckUrlObjectExistence">CheckUrl.checkReadability()</link>.
The rest of the code is identical to the <link
linkend="domFindImages">introductory example</link>:</para>
<informalfigure xml:id="solutionFintExtImgRef">
<programlisting language="none">package dom.xpath;
...
public class CheckExtImage {
private final SAXBuilder builder = new SAXBuilder();
public CheckExtImage() {
builder.setErrorHandler(new MySaxErrorHandler(System.err));
}
public void process(final String xhtmlFilename) throws JDOMException, IOException {
final Document htmlInput = builder.build(xhtmlFilename);
final XPathExpression<Object> xpath = XPathFactory.instance().compile(
"<emphasis role="bold">//img[starts-with(@src, 'http://') or starts-with(@src, 'ftp://')]</emphasis>");
final List<Object> images = xpath.evaluate(htmlInput);
for (Object o: images) {
final Element image = (Element ) o;
<emphasis role="bold">CheckUrl.checkReadability(image.getAttributeValue("src"));</emphasis>
}
}
}</programlisting>
</informalfigure>
</answer>
</qandaentry>
</qandadiv>
</qandaset>
</section>
<section xml:id="domXsl">
<title><acronym xlink:href="http://www.w3.org/DOM">DOM</acronym> and
<abbrev xlink:href="http://www.w3.org/Style/XSL">XSL</abbrev></title>
<para><link linkend="gloss_Java"><trademark>Java</trademark></link>
based <link linkend="gloss_XML"><abbrev>XML</abbrev></link>
applications may use XSL style sheets for processing. A <acronym
xlink:href="http://www.w3.org/DOM">DOM</acronym> tree may for example
be transformed into another tree. The package <link
xlink:href="http://java.sun.com/j2se/1.5.0/docs/api/javax/xml/transform/package-frame.html">javax.xml.transform</link>
provides interfaces and classes for this purpose. We consider the
following product catalog example:</para>
<figure xml:id="climbingCatalog">
<title>A simplified <link
linkend="gloss_XML"><abbrev>XML</abbrev></link> product
catalog</title>
<programlisting language="none"><catalog xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:noNamespaceSchemaLocation="catalog.xsd">
<title>Outdoor products</title>
<introduction>
<para>We offer a great variety of basic stuff for mountaineering
such as ropes, harnesses and tents.</para>
<para>Our shop is proud for its large number of available
sleeping bags.</para>
</introduction>
<product id="x-223">
<title>Multi freezing bag Nightmare camper</title>
<description>
<para>You will feel comfortable till minus 20 degrees - At
least if you are a penguin or a polar bear.</para>
</description>
</product>
<product id="r-334">
<title>Rope 40m</title>
<description>
<para>Excellent for indoor climbing.</para>
</description>
</product>
</catalog></programlisting>
<para>A corresponding schema file <filename>catalog.xsd</filename>
is straightforward:</para>
<programlisting language="none"><xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"
xmlns:vc="http://www.w3.org/2007/XMLSchema-versioning" elementFormDefault="qualified"
vc:minVersion="1.0" vc:maxVersion="1.1">
<xs:simpleType name="money">
<xs:restriction base="xs:decimal">
<xs:fractionDigits value="2"/>
</xs:restriction>
</xs:simpleType>
<xs:element name="title" type="xs:string"/>
<xs:element name="para" type="xs:string"/>
<xs:element name="description" type="paraSequence"/>
<xs:element name="introduction" type="paraSequence"/>
<xs:complexType name="paraSequence">
<xs:sequence>
<xs:element ref="para" minOccurs="1" maxOccurs="unbounded"/>
</xs:sequence>
</xs:complexType>
<xs:element name="product">
<xs:complexType>
<xs:sequence>
<xs:element ref="title"/>
<xs:element ref="description"/>
</xs:sequence>
<xs:attribute name="id" type="xs:ID" use="required"/>
<xs:attribute name="price" type="money" use="optional"/>
</xs:complexType>
</xs:element>
<xs:element name="catalog">
<xs:complexType>
<xs:sequence>
<xs:element ref="title"/>
<xs:element ref="introduction"/>
<xs:element ref="product" minOccurs="1" maxOccurs="unbounded"/>
</xs:sequence>
</xs:complexType>
</xs:element>
</xs:schema>
</programlisting>
</figure>
<para>A <abbrev xlink:href="http://www.w3.org/Style/XSL">XSL</abbrev>
style sheet may be used to transform this document into the HTML
Format:</para>
<figure xml:id="catalog2html">
<title>A <abbrev
xlink:href="http://www.w3.org/Style/XSL">XSL</abbrev> style sheet
for catalog transformation to HTML.</title>
<programlisting language="none"><?xml version="1.0" encoding="utf-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
version="2.0" xmlns="http://www.w3.org/1999/xhtml">
<xsl:template match="/catalog">
<html>
<head><title><xsl:value-of select="title"/></title></head>
<body style="background-color:#FFFFFF">
<h1><xsl:value-of select="title"/></h1>
<xsl:apply-templates select="product"/>
</body>
</html>
</xsl:template>
<xsl:template match="product">
<h3><xsl:value-of select="title"/></h3>
<xsl:for-each select="description/para">
<p><xsl:value-of select="."/></p>
</xsl:for-each>
<xsl:if test="price">
<p>
<xsl:text>Price:</xsl:text>
<xsl:value-of select="price/@value"/>
</p>
</xsl:if>
</xsl:template>
</xsl:stylesheet></programlisting>
</figure>
<para>As a preparation for <xref linkend="exercise_catalogRdbms"/> we
now demonstrate the usage of <abbrev
xlink:href="http://www.w3.org/Style/XSL">XSL</abbrev> within a <link
linkend="gloss_Java"><trademark>Java</trademark></link> application.
This is done by a <link
xlink:href="http://java.sun.com/j2se/1.5.0/docs/api/javax/xml/transform/Transformer.html">Transformer</link>
instance:</para>
<figure xml:id="xml2xml">
<title>Transforming an XML document instance to HTML by a XSL style
sheet.</title>
<programlisting language="none">package dom.xsl;
...
public class Xml2Html {
private final SAXBuilder builder = new SAXBuilder();
final XSLTransformer transformer;
public Xml2Html(final String xslFilename) throws XSLTransformException {
builder.setErrorHandler(new MySaxErrorHandler(System.err));
transformer = new XSLTransformer(xslFilename);
}
public void transform(final String xmlInFilename,
final String resultFilename) throws JDOMException, IOException {
final Document inDoc = builder.build(xmlInFilename);
Document result = transformer.transform(inDoc);
// Set formatting for the XML output
final Format outFormat = Format.getPrettyFormat();
// Serialize to console
final XMLOutputter printer = new XMLOutputter(outFormat);
printer.output(result.getDocument(), System.out);
}
}</programlisting>
</figure>
<para>A corresponding driver file is needed to invoke a
transformation:</para>
<figure xml:id="xml2xmlDriver">
<title>A driver class for the xml2xml transformer.</title>
<programlisting language="none">package dom.xsl;
...
public class Xml2HtmlDriver {
...
public static void main(String[] args) {
final String
inFilename = "Input/Dom/climbing.xml",
xslFilename = "Input/Dom/catalog2html.xsl",
htmlOutputFilename = "Input/Dom/climbing.html";
try {
final Xml2Html converter = new Xml2Html(xslFilename);
converter.transform(inFilename, htmlOutputFilename);
} catch (Exception e) {
System.err.println("The conversion of '" + inFilename
+ "' by stylesheet '" + xslFilename
+ "' to output HTML file '" + htmlOutputFilename
+ "' failed with the following error:" + e);
e.printStackTrace();
}
}
}</programlisting>
</figure>
<qandaset defaultlabel="qanda" xml:id="exercise_catalogRdbms">
<title>HTML from XML and relational data</title>
<qandadiv>
<qandaentry>
<question>
<label>Catalogs and RDBMS</label>
<para>We want to extend the transformation being described
before in <xref linkend="xml2xml"/> by reading price
information from a RDBMS. Consider the following schema and
<code>INSERT</code>s:</para>
<programlisting language="none">CREATE TABLE Product(
orderNo CHAR(10)
,price NUMERIC(10,2)
);
INSERT INTO Product VALUES('x-223', 330.20);
INSERT INTO Product VALUES('w-124', 110.40);</programlisting>
<para>Adding prices may be implemented the following
way:</para>
<mediaobject>
<imageobject>
<imagedata fileref="Ref/Fig/xml2html.fig"/>
</imageobject>
</mediaobject>
<para>You may implement this by following these steps:</para>
<orderedlist>
<listitem>
<para>You may reuse class
<classname>sax.rdbms.RdbmsAccess</classname> from <xref
linkend="saxRdbms"/>.</para>
</listitem>
<listitem>
<para>Use the previous class to modify <xref
linkend="xml2xml"/> by introducing a new method
<code>addPrices(final Document catalog)</code> which adds
prices to the <acronym
xlink:href="http://www.w3.org/DOM">DOM</acronym> tree
accordingly. The insertion points may be reached by an
<acronym
xlink:href="http://www.w3.org/TR/xpath">XPath</acronym>
expression.</para>
</listitem>
</orderedlist>
</question>
<answer>
<para>The additional functionality on top of <xref
linkend="xml2xml"/> is represented by a method
<methodname>dom.xsl.XmlRdbms2Html.addPrices()</methodname>.
This method modifies the <acronym
xlink:href="http://www.w3.org/DOM">DOM</acronym> input tree
prior to applying the XSL. Prices are being inserting based on
data received from an RDBMS via <trademark
xlink:href="http://electronics.zibb.com/trademark/jdbc/29545026">JDBC</trademark>:</para>
<programlisting language="none">package dom.xsl;
...
public class XmlRdbms2Html {
private final SAXBuilder builder = new SAXBuilder();
DbAccess db = new DbAccess();
final XSLTransformer transformer;
Document catalog;
final org.jdom2.xpath.XPathExpression<Object> selectProducts =
XPathFactory.instance().compile("/catalog/product");
/**
* @param xslFilename the stylesheet being used for subsequent
* transformations by {@link #transform(String, String)}.
*
* @throws XSLTransformException
*/
public XmlRdbms2Html(final String xslFilename) throws XSLTransformException {
builder.setErrorHandler(new MySaxErrorHandler(System.err));
transformer = new XSLTransformer(xslFilename);
}
/**
* The actual workhorse carrying out the transformation
* and adding prices from the database table.
*
* @param xmlInFilename input file to be transformed
* @param resultFilename the result file holding the generated HTML document
* @throws JDOMException The transformation may fail for various reasons.
* @throws IOException
*/
public void transform(final String xmlInFilename,
final String resultFilename) throws JDOMException, IOException {
catalog = builder.build(xmlInFilename);
addPrices();
final Document htmlResult = transformer.transform(catalog);
// Set formatting for the XML output
final Format outFormat = Format.getPrettyFormat();
// Serialize to console
final XMLOutputter printer = new XMLOutputter(outFormat);
printer.output(htmlResult, System.out);
}
private void addPrices() {
final List<Object> products = selectProducts.evaluate(catalog.getRootElement());
db.connect("jdbc:mysql://localhost:3306/hdm", "hdmuser", "XYZ");
for (Object p: products) {
final Element product = (Element ) p;
final String productId = product.getAttributeValue("id");
product.setAttribute("price", db.readPrice(productId));
}
db.close();
}
}</programlisting>
<para>The method <code>addPrices(...)</code> utilizes our
RDBMS access class:</para>
<programlisting language="none">package dom.xsl;
...
public class DbAccess {
public void connect(final String jdbcUrl,
final String userName, final String password) {
try {
conn = DriverManager.getConnection(jdbcUrl, userName, password);
priceQuery = conn.prepareStatement(sqlPriceQuery);
} catch (SQLException e) {
System.err.println("Unable to open connection to database:" + e);}
}
public String readPrice(final String articleNumber) {
String result;
try {
priceQuery.setString(1, articleNumber);
final ResultSet rs = priceQuery.executeQuery();
if (rs.next()) {
result = rs.getString("price");
} else {
result = "No price available for article '" + articleNumber + "'";
}
} catch (SQLException e) {
result = "Error reading price for article '" + articleNumber + "':" + e;
}
return result;
}
...
}</programlisting>
<para>Of course the connection details should be moved to a
configuration file.</para>
</answer>
</qandaentry>
</qandadiv>
</qandaset>
</section>
</section>
</chapter>
<chapter xml:id="introPersistence">
<title>Accessing Relational Data</title>
<section xml:id="persistence">
<title>Persistence in Object Oriented languages</title>
<para>Following <xref linkend="bib_Bauer05"/> we may define persistence
by:</para>
<blockquote>
<para>persistence allows an object to outlive the process that created
it. The state of the object may be stored to disk and an object with
the same state re-created at some point in the future.</para>
</blockquote>
<para>The notion of <quote>process</quote> refers to operating systems.
Let us start wit a simple example assuming a <link
linkend="gloss_Java"><trademark>Java</trademark></link> class
User:</para>
<programlisting language="none">public class User {
String cname; //The user's common name e.g. 'Joe Bix'
String uid; //The user's unique system ID (login name) e.g. 'bix'
// getters, setters and other stuff
...
}</programlisting>
<para>A relational implementation might look like:</para>
<programlisting language="none">CREATE TABLE User(
CHAR(80) cname
,CHAR(10) uid PRIMARY KEY
)</programlisting>
<para>Now a <link
linkend="gloss_Java"><trademark>Java</trademark></link> application may
create instances of class <code>User</code> and save these to a
database:</para>
<figure xml:id="processObjPersist">
<title>Persistence across process boundaries</title>
<mediaobject>
<imageobject>
<imagedata fileref="Ref/Fig/persistence.fig" scale="65"/>
</imageobject>
</mediaobject>
</figure>
<para>Both the <trademark
xlink:href="http://www.oracle.com/technetwork/java/javase">JRE</trademark>
instances and the RDBMS database server are processes (or sets of
processes) typically existing in different address spaces. The two
<trademark
xlink:href="http://www.oracle.com/technetwork/java/javase">JRE</trademark>
processes mentioned here may as well be started in disjoint address
spaces. In fact we might even run two entirely different applications
implemented in different programming languages like <abbrev
xlink:href="http://www.php.net">PHP</abbrev>.</para>
<para>It is important to mention that the two arrows
<quote>save</quote> and <quote>load</quote> thus typically denote a
communication across machine boundaries.</para>
</section>
<section xml:id="jdbcIntro">
<title>Introduction to <trademark
xlink:href="http://electronics.zibb.com/trademark/jdbc/29545026">JDBC</trademark></title>
<section xml:id="jdbcWrite">
<title>Write access, principles</title>
<para>Connecting an application to a database means to establish a
connection from a client to a database server:</para>
<figure xml:id="jdbcClientServer">
<title>Networking between clients and database servers</title>
<mediaobject>
<imageobject>
<imagedata fileref="Ref/Fig/clientserv.fig"/>
</imageobject>
</mediaobject>
</figure>
<para>So <trademark
xlink:href="http://www.oracle.com/technetwork/java/javase/jdbc">JDBC</trademark>
is just one among a whole bunch of protocol implementations connecting
database servers and applications. Consequently <trademark
xlink:href="http://www.oracle.com/technetwork/java/javase/jdbc">JDBC</trademark>
is expected to appear in the lower layer of multi-tier applications.
We take a three-tier application as a starting point:</para>
<figure xml:id="jdbcThreeTier">
<title>The role of <trademark
xlink:href="http://www.oracle.com/technetwork/java/javase/jdbc">JDBC</trademark>
in a three-tier application</title>
<mediaobject>
<imageobject>
<imagedata fileref="Ref/Fig/jdbcThreeTier.fig"/>
</imageobject>
</mediaobject>
</figure>
<para>We may add an additional layer. Web applications are typically
being build on top of an application server (<productname
xlink:href="http://www.ibm.com/software/de/websphere/">WebSphere</productname>,
<productname
xlink:href="http://glassfish.java.net">Glassfish</productname>,
<productname
xlink:href="http://www.jboss.org/jbossas">Jboss</productname>,...)
providing additional services:</para>
<figure xml:id="jdbcFourTier">
<title><trademark
xlink:href="http://www.oracle.com/technetwork/java/javase/jdbc">JDBC</trademark>
connecting application server and database.</title>
<mediaobject>
<imageobject>
<imagedata fileref="Ref/Fig/jdbcFourTier.fig"/>
</imageobject>
</mediaobject>
</figure>
<para>So what is actually required to connect to a database server? A
client requires the following parameter values to open a
connection:</para>
<orderedlist>
<listitem xml:id="ItemJdbcProtocol">
<para>The type of database server i.e. <productname
xlink:href="http://www.oracle.com/us/products/database">Oracle</productname>,
<productname
xlink:href="www.ibm.com/software/data/db2">DB2</productname>,
<productname
xlink:href="http://www-01.ibm.com/software/data/informix">Informix</productname>,
<productname xlink:href="http://www.mysql.com">Mysql</productname>
etc. This information is needed because of vendor dependent
<trademark
xlink:href="http://electronics.zibb.com/trademark/jdbc/29545026">JDBC</trademark>
protocol implementations.</para>
</listitem>
<listitem>
<para>The server's <link
xlink:href="http://en.wikipedia.org/wiki/Domain_Name_System">DNS</link>
name or IP number</para>
</listitem>
<listitem>
<para>The database service's port number at the previously defined
host. The database server process listens for connections to this
port number.</para>
</listitem>
<listitem xml:id="itemJdbcDatabaseName">
<para>The database name within the given database server</para>
</listitem>
<listitem>
<para>Optional: A database user's account name and
password.</para>
</listitem>
</orderedlist>
<para>Items <xref linkend="ItemJdbcProtocol"/> - <xref
linkend="itemJdbcDatabaseName"/> will be encapsulated into a so called
<trademark
xlink:href="http://electronics.zibb.com/trademark/jdbc/29545026">JDBC</trademark>
<link
xlink:href="http://en.wikipedia.org/wiki/Uniform_Resource_Locator">URL</link>.
We consider a typical example corresponding to the previous parameter
list:</para>
<figure xml:id="jdbcUrlComponents">
<title>Components of a <trademark
xlink:href="http://electronics.zibb.com/trademark/jdbc/29545026">JDBC</trademark>
URL</title>
<mediaobject>
<imageobject>
<imagedata fileref="Ref/Fig/jdbcurl.fig" scale="65"/>
</imageobject>
</mediaobject>
</figure>
<para>In fact this <trademark
xlink:href="http://electronics.zibb.com/trademark/jdbc/29545026">JDBC</trademark>
URL example closely resembles other types of URL strings as being
defined in <uri
xlink:href="http://www.ietf.org/rfc/rfc2396.txt">http://www.ietf.org/rfc/rfc2396.txt</uri>.
Look for <code>opaque_part</code> to understand the second
<quote>:</quote> in the protocol definition part of a <trademark
xlink:href="http://electronics.zibb.com/trademark/jdbc/29545026">JDBC</trademark>
URL. Common example for <abbrev
xlink:href="http://www.ietf.org/rfc/rfc1738.txt">URL</abbrev>s
are:</para>
<itemizedlist>
<listitem>
<para><code>http://www.hdm-stuttgart.de/aaa</code></para>
</listitem>
<listitem>
<para><code>http://someserver.com:8080/someResource</code></para>
</listitem>
<listitem>
<para><code>ftp://mirror.mi.hdm-stuttgart.de/Firmen</code></para>
</listitem>
</itemizedlist>
<para>We notice the explicit mentioning of a port number 8080 in the
second example; The default <abbrev
xlink:href="http://www.w3.org/Protocols">http</abbrev> protocol port
number is 80. So if a web server accepts connections at port 80 we do
not have to specify this value. A web browser will automatically use
this default port.</para>
<para>Actually the notion <quote><code>jdbc:mysql</code></quote>
denotes a sub protocol implementation namely<orgname>
Mysql</orgname>'s implementation of <trademark
xlink:href="http://electronics.zibb.com/trademark/jdbc/29545026">JDBC</trademark>.
Connecting to an IBM DB2 server would require jdbc:db2 for this
protocol part.</para>
<para>In contrast to <abbrev
xlink:href="http://www.w3.org/Protocols">http</abbrev> no standard
ports are <quote>officially</quote> assigned for <trademark
xlink:href="http://electronics.zibb.com/trademark/jdbc/29545026">JDBC</trademark>
protocol variants. Due to vendor specific implementations this does
not make any sense. Thus we <emphasis role="bold">always</emphasis>
have to specify the port number when opening <trademark
xlink:href="http://electronics.zibb.com/trademark/jdbc/29545026">JDBC</trademark>
connections.</para>
<para>Writing <trademark
xlink:href="http://electronics.zibb.com/trademark/jdbc/29545026">JDBC</trademark>
based applications follows a simple scheme:</para>
<figure xml:id="jdbcArchitecture">
<title>Architecture of JDBC</title>
<mediaobject>
<imageobject>
<imagedata fileref="Ref/Fig/jdbcarch.fig"/>
</imageobject>
</mediaobject>
</figure>
<para>From a programmer's point of view the
<classname>java.sql.DriverManager</classname> is a bootstrapping
object: Other objects like Statement instances are created from this
central and unique object.</para>
<para>The first instance being created by the
<classname>java.sql.DriverManager</classname> is an object of type
<classname>java.sql.Connection</classname>. In <xref
linkend="exerciseJdbcWhyInterface"/> we discuss the way vendor
specific implementation details are hidden by Interfaces. We can
distinguish between:</para>
<orderedlist>
<listitem>
<para>Vendor neutral specific parts of a <trademark
xlink:href="http://www.oracle.com/technetwork/java/javase/jdbc">JDBC</trademark>
environment. These are those components being shipped by Oracle or
other organizations providing <link
linkend="gloss_Java"><trademark>Java</trademark></link> runtimes.
The class <classname>java.sql.DriverManager</classname> belongs to
this domain.</para>
</listitem>
<listitem>
<para>Vendor specific parts. In <xref linkend="jdbcArchitecture"/>
this starts with the <classname>java.sql.Connection</classname>
object.</para>
</listitem>
</orderedlist>
<para>The <classname>java.sql.Connection</classname> object thus marks
the boundary between a <trademark
xlink:href="http://www.oracle.com/technetwork/java/javase">JDK</trademark>
/ <trademark
xlink:href="http://www.oracle.com/technetwork/java/javase">JRE</trademark>
and a <trademark
xlink:href="http://www.oracle.com/technetwork/java/javase/jdbc">JDBC</trademark>
Driver implementation from e.g. Oracle or other institutions.</para>
<para><xref linkend="jdbcArchitecture"/> does not show details about
the relations between <classname>java.sql.Connection</classname>,
<classname>java.sql.Statement</classname> and
<classname>java.sql.ResultSet</classname> objects. We start by giving
a rough description of the tasks and responsibilities these three
types have:</para>
<glosslist>
<glossentry>
<glossterm><classname>java.sql.Connection</classname></glossterm>
<glossdef>
<para>Holding a permanent connection to a database server. Both
client and server can contact each other. The database server
may for example terminate a transaction if problems like
deadlocks occur.</para>
</glossdef>
</glossentry>
<glossentry>
<glossterm><classname>java.sql.Statement</classname></glossterm>
<glossdef>
<para>We have two distinct classes of actions:</para>
<orderedlist>
<listitem>
<para>Instructions to modify data on the database server.
These include <code>INSERT</code>, <code>UPDATE</code> and
<code>DELETE</code> operations as far as
<abbrev>SQL-DML</abbrev> is concerned. <trademark
xlink:href="http://www.oracle.com/technetwork/java/javase/jdbc">JDBC</trademark>
acts as a means of transport and merely returns integer
values back to the client like the number of rows being
affected by an UPDATE.</para>
</listitem>
<listitem>
<para>Instructions reading data from the server. This is
done by sending SELECT statements. It is not sufficient to
just return integer values: Instead <trademark
xlink:href="http://www.oracle.com/technetwork/java/javase/jdbc">JDBC</trademark>
needs to copy complete datasets back to the client to fill
containers being accessible by applications. This is being
discussed in <xref linkend="jdbcRead"/>.</para>
</listitem>
</orderedlist>
</glossdef>
</glossentry>
</glosslist>
<para>We shed some light on the relationship between these important
<trademark
xlink:href="http://www.oracle.com/technetwork/java/javase/jdbc">JDBC</trademark>
components and their respective creation:<figure
xml:id="jdbcObjectCreation">
<title>Important <trademark
xlink:href="http://www.oracle.com/technetwork/java/javase/jdbc">JDBC</trademark>
instances and relationships.</title>
<mediaobject>
<imageobject>
<imagedata fileref="Ref/Fig/jdbcObjectRelation.fig"/>
</imageobject>
</mediaobject>
</figure></para>
</section>
<section xml:id="writeAccessCoding">
<title>Write access, coding!</title>
<para>So how does it actually work with respect to coding? You may
want to read <xref linkend="toolingConfigJdbc"/> before starting your
exercises. We first prepare a database table using Eclipse's database
tools:</para>
<figure xml:id="figSchemaPerson">
<title>A relation <code>Person</code> containing names and email
addresses</title>
<programlisting language="none"><emphasis role="strong">CREATE</emphasis> <emphasis
role="strong">TABLE</emphasis> Person (
name CHAR(20)
,email CHAR(20) <emphasis>UNIQUE</emphasis>)</programlisting>
</figure>
<para>Our actual (toy) <trademark
xlink:href="http://electronics.zibb.com/trademark/jdbc/29545026">JDBC</trademark>
application will insert a single object ('Jim', 'jim@foo.org') into
the <code>Person</code> relation. This is simpler than reading data
since no client <classname>java.sql.ResultSet</classname> container is
needed:</para>
<figure xml:id="figJdbcSimpleWrite">
<title>A simple <trademark
xlink:href="http://electronics.zibb.com/trademark/jdbc/29545026">JDBC</trademark>
application inserting data into a relational table.</title>
<programlisting language="none">01 package sda.jdbc.intro.v1;
02
03 import java.sql.Connection;
04 import java.sql.DriverManager;
05 import java.sql.SQLException;
06 import java.sql.Statement;
07
08 public class SimpleInsert {
09
10 public static void main(String[] args) throws SQLException {
11 // Step 1: Open a connection to the database server
12 final Connection conn = DriverManager.getConnection(
13 "jdbc:mysql://localhost:3306/hdm", "hdmuser", "XYZ");
14 // Step 2: Create a Statement instance
15 final Statement stmt = conn.createStatement();
16 // Step 3: Execute the desired INSERT
17 final int updateCount = stmt.executeUpdate(
18 "INSERT INTO Person VALUES('Jim', 'jim@foo.org')");
19 // Step 4: Give feedback to the enduser
20 System.out.println("Successfully inserted " + updateCount + " dataset(s)");
21 }
22 }</programlisting>
</figure>
<para>Looks simple? Unfortunately it does not (yet) work:</para>
<programlisting language="none">Exception in thread "main" java.sql.SQLException: <emphasis
role="bold">
No suitable driver found for jdbc:mysql://localhost:3306/hdm</emphasis>
at java.sql.DriverManager.getConnection(DriverManager.java:604)
at java.sql.DriverManager.getConnection(DriverManager.java:221)
at sda.jdbc.intro.SimpleInsert.main(SimpleInsert.java:12)</programlisting>
<para>What's wrong here? In <xref linkend="figureConfigJdbcDriver"/>
we needed a <productname
xlink:href="http://www.mysql.com">Mysql</productname> <trademark
xlink:href="http://electronics.zibb.com/trademark/jdbc/29545026">JDBC</trademark>
Driver implementation <filename>mysql-connector-java.jar</filename> as
a prerequisite to open connections to a database server. This
implementation is mandatory for our toy application as well. All we
have to do is adding <filename>mysql-connector-java.jar</filename> to
our <link linkend="gloss_Java"><trademark>Java</trademark></link>
<varname>CLASSPATH</varname> at <emphasis
role="bold">runtime</emphasis>.</para>
<para>Depending on our <link
linkend="gloss_Java"><trademark>Java</trademark></link> environment
this will be achieved by different means. Eclipse requires the
definition of a run configuration as being described in <uri
xlink:href="http://help.eclipse.org/juno/index.jsp?topic=/org.eclipse.jdt.doc.user/tasks/tasks-java-local-configuration.htm">http://help.eclipse.org/juno/index.jsp?topic=/org.eclipse.jdt.doc.user/tasks/tasks-java-local-configuration.htm</uri>.
When configuring a run-time configuration for
<classname>sda.jdbc.intro.SimpleInsert</classname> we have to add
<filename>mysql-connector-java.jar</filename> to the
<varname>Classpath</varname> tab. The following screen shot shows a
working configuration:</para>
<figure xml:id="figureConfigRunExtJar">
<title>Creating an Eclipse run time configuration containing a
<productname xlink:href="http://www.mysql.com">Mysql</productname>
<trademark
xlink:href="http://electronics.zibb.com/trademark/jdbc/29545026">JDBC</trademark>
Driver Jar marked red.</title>
<screenshot>
<mediaobject>
<imageobject>
<imagedata fileref="Ref/Screen/runConfigJarAnnot.screen.png"
scale="70"/>
</imageobject>
</mediaobject>
</screenshot>
</figure>
<para>This time execution works as expected:</para>
<programlisting language="none">Successfully inserted 1 dataset(s)</programlisting>
<qandaset defaultlabel="qanda" xml:id="quandaentry_DupInsert">
<title>Exception on inserting objects</title>
<qandadiv>
<qandaentry>
<question>
<para>A second invocation of
<classname>sda.jdbc.intro.v1.SimpleInsert</classname> yields
the following runtime error:</para>
<programlisting language="none">Exception in thread "main"
com.mysql.jdbc.exceptions.jdbc4.MySQLIntegrityConstraintViolationException:
<emphasis role="bold">Duplicate entry 'jim@foo.org' for key 'email'</emphasis>
...
at com.mysql.jdbc.StatementImpl.executeUpdate(StatementImpl.java:1617)
at sda.jdbc.intro.SimpleInsert.main(SimpleInsert.java:17)</programlisting>
</question>
<answer>
<para>This expected error is easy to understand: The
exception's message text <emphasis role="bold">Duplicate entry
'Jim' for key 'PRIMARY'</emphasis> informs us about a UNIQUE
key constraint violation with respect to the attribute
<code>email</code> in our schema definition in <xref
linkend="figSchemaPerson"/>. We cannot add a second entry with
the same value <code>'jim@foo.org'</code>.</para>
</answer>
</qandaentry>
</qandadiv>
</qandaset>
<para>It is worth to mention that the <productname
xlink:href="http://www.mysql.com">Mysql</productname> driver
implementation does not have to be available at compile time.
<trademark
xlink:href="http://electronics.zibb.com/trademark/jdbc/29545026">JDBC</trademark>
defines interfaces in favour of (concrete) classes. The latter are
only required at runtime.</para>
<para>When working with eclipse we need a separate runtime
configuration for each runnable <link
linkend="gloss_Java"><trademark>Java</trademark></link> application to
add the <trademark
xlink:href="http://electronics.zibb.com/trademark/jdbc/29545026">JDBC</trademark>
driver implementation to the runtime <envar>CLASSPATH</envar>. This
may become tedious. Judging the pros and cons you may simply add
<filename>mysql-connector-java.jar</filename> to your compile time
<envar>CLASSPATH as well</envar>. As a drawback all <trademark
xlink:href="http://electronics.zibb.com/trademark/jdbc/29545026">JDBC</trademark>
implementing classes will now become visible wen e.g. hitting
auto-completion.</para>
<para>We now discuss some important methods being defined in the
<trademark
xlink:href="http://electronics.zibb.com/trademark/jdbc/29545026">JDBC</trademark>
interfaces:</para>
<glosslist>
<glossentry>
<glossterm><classname>java.sql.Connection</classname></glossterm>
<glossdef>
<itemizedlist>
<listitem>
<para><link
xlink:href="http://docs.oracle.com/javase/7/docs/api/java/sql/Connection.html#createStatement()">createStatement()</link></para>
</listitem>
<listitem>
<para><link
xlink:href="http://docs.oracle.com/javase/7/docs/api/java/sql/Connection.html#setAutoCommit(boolean)">setAutoCommit()</link>,
<link
xlink:href="http://docs.oracle.com/javase/7/docs/api/java/sql/Connection.html#getAutoCommit()">getAutoCommit()</link></para>
</listitem>
<listitem>
<para><link
xlink:href="http://docs.oracle.com/javase/7/docs/api/java/sql/Connection.html#getWarnings()">getWarnings()</link></para>
</listitem>
<listitem>
<para><link
xlink:href="http://docs.oracle.com/javase/7/docs/api/java/sql/Connection.html#isClosed()">isClosed()</link>,
<link
xlink:href="http://docs.oracle.com/javase/7/docs/api/java/sql/Connection.html#isValid(int)">isValid(int
timeout)</link></para>
</listitem>
<listitem>
<para><link
xlink:href="http://docs.oracle.com/javase/7/docs/api/java/sql/Connection.html#rollback()">rollback()</link>,
<link
xlink:href="http://docs.oracle.com/javase/7/docs/api/java/sql/Connection.html#commit()">commit()</link>
and .</para>
</listitem>
<listitem>
<para><link
xlink:href="http://docs.oracle.com/javase/7/docs/api/java/sql/Connection.html#close()">close()</link></para>
</listitem>
</itemizedlist>
</glossdef>
</glossentry>
<glossentry>
<glossterm><classname>java.sql.Statement</classname></glossterm>
<glossdef>
<itemizedlist>
<listitem>
<para><link
xlink:href="http://docs.oracle.com/javase/7/docs/api/java/sql/Statement.html#executeUpdate(java.lang.String)">executeUpdate(String
sql)</link></para>
</listitem>
<listitem>
<para><link
xlink:href="http://docs.oracle.com/javase/7/docs/api/java/sql/Statement.html#getConnection()">getConnection()</link></para>
</listitem>
<listitem>
<para><link
xlink:href="http://docs.oracle.com/javase/7/docs/api/java/sql/Statement.html#getResultSet()">getResultSet()</link></para>
</listitem>
<listitem>
<para><link
xlink:href="http://docs.oracle.com/javase/7/docs/api/java/sql/Statement.html#close()">close()</link>
and <link
xlink:href="http://docs.oracle.com/javase/7/docs/api/java/sql/Statement.html#isClosed()">isClosed()</link></para>
</listitem>
</itemizedlist>
</glossdef>
</glossentry>
</glosslist>
<qandaset defaultlabel="qanda" xml:id="quandaentry_AutoCommit">
<title><trademark
xlink:href="http://electronics.zibb.com/trademark/jdbc/29545026">JDBC</trademark>
and transactions</title>
<qandadiv>
<qandaentry>
<question>
<para><link
xlink:href="http://docs.oracle.com/javase/7/docs/api/java/sql/Connection.html#setAutoCommit(boolean)">How
does the method setAutoCommit()</link> relate to <link
xlink:href="http://docs.oracle.com/javase/7/docs/api/java/sql/Connection.html#commit()">commit()</link>
and <link
xlink:href="http://docs.oracle.com/javase/7/docs/api/java/sql/Connection.html#rollback()">rollback()</link>?</para>
</question>
<answer>
<para>A connections default state is <code>autocommit ==
true</code>. This means that individual SQL statements are
executed as separate transactions.</para>
<para>If we want to group two or more statements into a
transaction we have to:</para>
<orderedlist>
<listitem>
<para>Call
<code>connection.setAutoComit(false)</code></para>
</listitem>
<listitem>
<para>From now on subsequent SQL statements will
implicitly become part of a transaction till either of the
three events happens:</para>
<orderedlist numeration="loweralpha">
<listitem>
<para><code>connection.commit()</code></para>
</listitem>
<listitem>
<para><code>connection.rollback()</code></para>
</listitem>
<listitem>
<para>The transaction gets aborted by the database
server. This may for example happen in case of a
deadlock conflict with a second transaction.</para>
</listitem>
</orderedlist>
<para>Note that the first two events are initiated by our
client software. The third possible action is being
carried out by the database server.</para>
</listitem>
</orderedlist>
</answer>
</qandaentry>
</qandadiv>
</qandaset>
<qandaset defaultlabel="qanda" xml:id="quandaentry_Close">
<title>Closing <trademark
xlink:href="http://electronics.zibb.com/trademark/jdbc/29545026">JDBC</trademark>
connections</title>
<qandadiv>
<qandaentry>
<question>
<para>Why is it very important to call the close() method for
<classname>java.sql.Connection</classname> and / or
<classname>java.sql.Statement</classname> instances?</para>
</question>
<answer>
<para>A <trademark
xlink:href="http://electronics.zibb.com/trademark/jdbc/29545026">JDBC</trademark>
connection ties network resources (socket connections). These
may be used up if e.g. new connections get established within
a loop without being closed.</para>
<para>The situation is comparable to memory leaks when using
programming languages lacking a garbage collector.</para>
</answer>
</qandaentry>
</qandadiv>
</qandaset>
<qandaset defaultlabel="qanda" xml:id="quandaentry_AbortTran">
<title>Aborted transactions</title>
<qandadiv>
<qandaentry>
<question>
<para>In the previous exercise we mentioned the possibility of
a transaction abort issued by the database server. Which
responsibility arises for an application programmer? Hint: How
may an implementation become aware of such an abort
transaction event?</para>
</question>
<answer>
<para>If a database server aborts a transaction a
<classname>java.sql.SQLException</classname> will be thrown.
An application must be aware of this possibility and thus
implement a sensible <code>catch(...)</code> clause
accordingly.</para>
</answer>
</qandaentry>
</qandadiv>
</qandaset>
<qandaset defaultlabel="qanda" xml:id="exerciseJdbcWhyInterface">
<title>Interfaces and classes in <trademark
xlink:href="http://electronics.zibb.com/trademark/jdbc/29545026">JDBC</trademark></title>
<qandadiv>
<qandaentry>
<question>
<para>The <trademark
xlink:href="http://electronics.zibb.com/trademark/jdbc/29545026">JDBC</trademark>
standard mostly defines interfaces as
<classname>java.sql.Connection</classname> and
<classname>java.sql.Statement</classname>. Why are these not
being defined as classes? Moreover why is
<classname>java.sql.DriverManager</classname> being defined as
a class rather than an interface?</para>
<para>You may want to supply code examples to explain your
argumentation.</para>
</question>
<answer>
<para>Figure <xref linkend="jdbcArchitecture"/> tells us about
the vendor independent architecture of <trademark
xlink:href="http://electronics.zibb.com/trademark/jdbc/29545026">JDBC</trademark>.
Oracle for example may implement a class
<code>com.oracle.jdbc.OracleConnection</code>:</para>
<programlisting annotations="nojavadoc" language="java">package com.oracle.jdbc;
import java.sql.Connection;
import java.sql.Statement;
import java.sql.SQLException;
public class OracleConnection implements Connection {
...
Statement createStatement(int resultSetType,
int resultSetConcurrency)
throws SQLException) {
// Implementation omitted here due to
// limited personal hacking capabilities
...
}
...
}</programlisting>
<para>If a programmer only uses the <trademark
xlink:href="http://electronics.zibb.com/trademark/jdbc/29545026">JDBC</trademark>
interfaces rather than a vendor's classes it is much easier to
make the resulting application work with different databases
from other vendors. This way a company's implementation is not
exposed to our own <link
linkend="gloss_Java"><trademark>Java</trademark></link>
code.</para>
<para>Regarding the special role of
<classname>java.sql.DriverManager</classname> we notice the
need of a starting point: We have to create an initial
instance of some class. In theory (<emphasis role="bold">BUT
NOT IN PRACTICE!!!</emphasis>) the following (ugly code) might
be possible:</para>
<programlisting language="none">package my.personal.application;
import java.sql.Connection;
import java.sql.Statement;
import java.sql.SQLException;
public someClass {
public void someMethod(){
Connection conn = <emphasis role="bold">new OracleConnection()</emphasis>; // bad idea!
...
}
...
}</programlisting>
<para>The problem with this approach is the explicit
constructor call: Whenever we want to use another database we
have two possibilities:</para>
<itemizedlist>
<listitem>
<para>Rewrite our code.</para>
</listitem>
<listitem>
<para>Introduce some sort of switch statement to provide a
fixed number of databases beforehand:</para>
<programlisting language="none">public void someMethod(final String vendor){
final Connection conn;
switch(vendor) {
case "ORACLE":
conn = new OracleConnection();
break;
case "DB2":
conn = new Db2Connection();
break;
default:
conn = null;
break;
}
...
}</programlisting>
<para>Adding a new database still requires code
rewriting.</para>
</listitem>
</itemizedlist>
</answer>
</qandaentry>
</qandadiv>
</qandaset>
<qandaset defaultlabel="qanda" xml:id="quandaentry_DriverDispatch">
<title>Driver dispatch mechanism</title>
<qandadiv>
<qandaentry>
<question>
<para>In exercise <xref linkend="exerciseJdbcWhyInterface"/>
we saw a hypothetic way to resolve the interface/class
resolution problem by using a switch clause. How is this
<code>switch</code> clause's logic actually realized in a
<trademark
xlink:href="http://electronics.zibb.com/trademark/jdbc/29545026">JDBC</trademark>
based application? (<quote>behind the scenes</quote>)</para>
<para>Hint: Read the documentation of
<classname>java.sql.DriverManager</classname>.</para>
</question>
<answer>
<para>Prior to opening a Connection a <trademark
xlink:href="http://electronics.zibb.com/trademark/jdbc/29545026">JDBC</trademark>
driver registers itself at the
<classname>java.sql.DriverManager</classname> singleton
instance. For this purpose the standard defined the method
<link
xlink:href="http://docs.oracle.com/javase/7/docs/api/java/sql/DriverManager.html#registerDriver(java.sql.Driver)">registerDriver(Driver)</link>.
On success the <classname>java.sql.DriverManager</classname>
adds the driver to an internal dictionary:</para>
<informaltable border="1">
<col width="20%"/>
<col width="30%"/>
<tr>
<th>protocol</th>
<th>driver instance</th>
</tr>
<tr>
<td>jdbc:mysql</td>
<td>mysqlDriver instance</td>
</tr>
<tr>
<td>jdbc:oracle</td>
<td>oracleDriver instance</td>
</tr>
<tr>
<td>...</td>
<td>...</td>
</tr>
</informaltable>
<para>So whenever the method <link
xlink:href="http://docs.oracle.com/javase/7/docs/api/java/sql/DriverManager.html#getConnection(java.lang.String,%20java.lang.String,%20java.lang.String)">getConnection()</link>
is being called the
<classname>java.sql.DriverManager</classname> will scan the
<trademark
xlink:href="http://electronics.zibb.com/trademark/jdbc/29545026">JDBC</trademark>
URL and isolate the protocol part. If we start with
<code>jdbc:mysql://someserver.com:3306/someDatabase</code>
this is just <code>jdbc:mysql</code>. The value is then being
looked up in the above table of registered drivers to choose
an appropriate instance or null otherwise. This way our
hypothetic switch including the default value null is actually
implemented.</para>
</answer>
</qandaentry>
</qandadiv>
</qandaset>
</section>
<section xml:id="propertiesFile">
<title>Connection properties</title>
<para>So far our application depicted in <xref
linkend="figJdbcSimpleWrite"/> suffers both from missing error
handling and hard-coded parameters.</para>
<para>Professional applications must be configurable. Changing the
password currently requires source code modification and
recompilation. <link
linkend="gloss_Java"><trademark>Java</trademark></link> offers a
standard procedure to externalize parameters like
<varname>username</varname>, <varname>password</varname> an <trademark
xlink:href="http://electronics.zibb.com/trademark/jdbc/29545026">JDBC</trademark>
connection URL as being present in <xref
linkend="figJdbcSimpleWrite"/>: We may externalize these parameters to
external so called properties files:</para>
<figure xml:id="propertyExternalization">
<title>Externalize a single string <code>"User name"</code> to a
separate file <filename>message.properties</filename>.</title>
<mediaobject>
<imageobject>
<imagedata fileref="Ref/Fig/externalize.fig"/>
</imageobject>
</mediaobject>
</figure>
<para>The current figure shows the externalization of just a single
property. The file <filename>message.properties</filename> contains
key-value pairs. The key <code>PropHello.uname</code> contains the
value <code>User name</code>. Multiple strings may be externalized to
the same properties file.</para>
<para>Eclipse does have tool support for externalization. Simply hit
Source --> Externalize Strings from the context menu. This
activates a wizard to define property keys, renaming the generated
helper class' name and finally create the actual
<filename>message.properties</filename> file.</para>
<qandaset defaultlabel="qanda" xml:id="quandaentry_WritProps">
<title>Moving <trademark
xlink:href="http://electronics.zibb.com/trademark/jdbc/29545026">JDBC</trademark>
<abbrev
xlink:href="http://www.ietf.org/rfc/rfc1738.txt">URL</abbrev> and
credentials to a property file</title>
<qandadiv>
<qandaentry>
<question>
<para>Start executing the code given in <xref
linkend="figJdbcSimpleWrite"/>. Then extend this example by
externalizing all <trademark
xlink:href="http://electronics.zibb.com/trademark/jdbc/29545026">JDBC</trademark>
related connection parameters to a
<filename>jdbc.properties</filename> file like:</para>
<programlisting language="none">SimpleInsert.jdbcUrl=jdbc:mysql://localhost:3306/hdm
SimpleInsert.password=XYZ
SimpleInsert.username=hdmuser</programlisting>
<para>As being stated earlier the eclipse wizard assists you
by generating both the properties file and a helper class
reading that file at runtime.</para>
</question>
<answer>
<para>The current exercise is mostly related to tooling. From
our <link
linkend="gloss_Java"><trademark>Java</trademark></link> code
the context menu allows us to choose the desired
wizard:</para>
<informalfigure>
<mediaobject>
<imageobject>
<imagedata fileref="Ref/Screen/externalize.screen.png"/>
</imageobject>
</mediaobject>
</informalfigure>
<para>We may now:</para>
<itemizedlist>
<listitem>
<para>Select the strings to be externalized.</para>
</listitem>
<listitem>
<para>Supply key names. In the subsequent screenshot this
task has already been started by manually replacing the
default <code>SimpleInsert.1</code> by
<code>Simpleinsert.jdbc</code>.</para>
</listitem>
<listitem>
<para>Redefine other parameters like prefix, properties
file name etc. In the following screenshot only the first
of three keys has been manually renamed to the sensible
value <varname>SimpleInsert.jdbc</varname>.</para>
</listitem>
</itemizedlist>
<informalfigure>
<mediaobject>
<imageobject>
<imagedata fileref="Ref/Screen/externalize2.screen.png"/>
</imageobject>
</mediaobject>
</informalfigure>
<para>The wizard also generates a class
<classname>sda.jdbc.intro.v1.DbProps</classname> to actually
access our properties:</para>
<programlisting language="none">package sda.jdbc.intro.v1;
...
public class DbProps {
private static final String BUNDLE_NAME = "sda.jdbc.intro.v1.database";
private static final ResourceBundle RESOURCE_BUNDLE = ResourceBundle
.getBundle(BUNDLE_NAME);
private DbProps() {
}
public static String getString(String key) {
try {
return RESOURCE_BUNDLE.getString(key);
} catch (MissingResourceException e) {
return '!' + key + '!';
}
}
}</programlisting>
<para>Our <trademark
xlink:href="http://electronics.zibb.com/trademark/jdbc/29545026">JDBC</trademark>
related code now contains three references to external
properties:</para>
<programlisting language="none">package sda.jdbc.intro.v1;
...
public class SimpleInsert {
public static void main(String[] args) throws SQLException {
// Step 1: Open a connection to the database server
final Connection conn = DriverManager.getConnection (
<emphasis role="bold">DbProps.getString("PersistenceHandler.jdbcUrl"), </emphasis>
<emphasis role="bold">DbProps.getString("PersistenceHandler.username")</emphasis>,
<emphasis role="bold">DbProps.getString("PersistenceHandler.password")</emphasis>);
// Step 2: Create a Statement instance
final Statement stmt = conn.createStatement();
// Step 3: Execute the desired INSERT
final int updateCount = stmt.executeUpdate(
"INSERT INTO Person VALUES('Jim', 'jim@foo.org')");
// Step 4: Give feedback to the enduser
System.out.println("Successfully inserted " + updateCount + " dataset(s)");
}
}</programlisting>
<para>The current base name
<classname>sda.jdbc.intro.v1.PersistenceHandler</classname> is
related to a later exercise.</para>
</answer>
</qandaentry>
</qandadiv>
</qandaset>
</section>
<section xml:id="sectSimpleInsertGui">
<title>A first GUI sketch</title>
<para>So far all data records being transferred to the database server
are still hard-coded in our application. In practice a user wants to
enter data of persons to be submitted to the database.</para>
<para>We now guide you to develop a first version of a simple GUI for
this tasks. A more <link linkend="figureDataInsert2">elaborate
version</link> will be presented in a follow-up exercise. The
screenshot illustrates the intended application behaviour:</para>
<figure xml:id="simpleInsertGui">
<title>A simple GUI to insert data into a database server.</title>
<mediaobject>
<imageobject>
<imagedata fileref="Ref/Screen/simpleInsertGui.screen.png"/>
</imageobject>
<caption>
<para>After clicking <quote>Insert</quote> a message is being
presented to the user. This message may as well indicate a
failure.</para>
</caption>
</mediaobject>
</figure>
<para>Implementing Swing GUI applications requires knowledge as being
taught in e.g. <link
xlink:href="http://www.hdm-stuttgart.de/studenten/stundenplan/vorlesungsverzeichnis/vorlesung_detail?vorlid=5212221">113300
Entwicklung von Web-Anwendungen</link>. If you do not (yet) feel
comfortable writing <productname
xlink:href="http://docs.oracle.com/javase/tutorial/uiswing/index.html">Swing</productname>
applications you may want to read <uri
xlink:href="http://www.javamex.com/tutorials/swing">http://www.javamex.com/tutorials/swing</uri>
and <emphasis role="bold">really</emphasis> understand the examples
being presented therein.</para>
<qandaset defaultlabel="qanda" xml:id="quandaentry_GuiDb">
<title>GUI for inserting Person data to a database server</title>
<qandadiv>
<qandaentry>
<question>
<para>Write a GUI application as being outlined in <xref
linkend="simpleInsertGui"/>. You may proceed as
follows:</para>
<orderedlist>
<listitem>
<para>Write a dummy GUI without any database
functionality. Only present the two labels an input fields
and the Insert button.</para>
</listitem>
<listitem>
<para>Add an
<classname>java.awt.event.ActionListener</classname> which
generates a SQL INSERT Statement when clicking the Insert
button. Return this string to the user as being shown in
the message window of <xref
linkend="simpleInsertGui"/>.</para>
<para>At this point you still do not need a database
connection. The message shown to the user is just a fake,
so the GUI <emphasis role="bold">appears</emphasis> to be
working.</para>
</listitem>
<listitem>
<para>Establish a
<classname>java.sql.Connection</classname> and create a
<classname>java.sql.Statement</classname> instance when
launching your application. Use the latter in your
<classname>java.awt.event.ActionListener</classname> to
actually insert datasets into your database.</para>
</listitem>
</orderedlist>
</question>
<answer>
<para>The complete implementation resides in
<classname>sda.jdbc.intro.v01.InsertPerson</classname>:</para>
<programlisting language="none">package sda.jdbc.intro.v01;
import ...
public class InsertPerson extends JFrame {
...
public InsertPerson () throws SQLException{
super ("Add a person's data");
setDefaultCloseOperation(JFrame.EXIT_ON_CLOSE);
final JPanel databaseFieldPanel = new JPanel();
databaseFieldPanel.setLayout(new GridLayout(0,2));
add(databaseFieldPanel, BorderLayout.CENTER);
databaseFieldPanel.add(new JLabel("Name:"));
final JTextField nameField = new JTextField(15);
databaseFieldPanel.add(nameField);
databaseFieldPanel.add(new JLabel("E-mail:"));
final JTextField emailField = new JTextField(15);
databaseFieldPanel.add(emailField);
final JButton insertButton = new JButton("Insert");
add(insertButton, BorderLayout.SOUTH);
final Connection conn = DriverManager.getConnection(
"jdbc:mysql://localhost:3306/hdm", "hdmuser", "XYZ");
final Statement stmt = conn.createStatement();
insertButton.addActionListener(new ActionListener() {
// Linking the GUI to the database server. We assume an open
// connection and a correctly initialized Statement instance
@Override
public void actionPerformed(ActionEvent event) {
final String sql = "INSERT INTO Person VALUES('" + nameField.getText()+ "', '"
+ emailField.getText() + "')";
// We have to catch this Exception because an ActionListener's signature
// prohibits the existence of a "throws" clause.
try {
final int updateCount = stmt.executeUpdate(sql);
JOptionPane.showMessageDialog(null, "Successfully executed \n'" + sql + "'\nand inserted "
+ updateCount + " dataset");
} catch (SQLException e) {
e.printStackTrace();
}
}
});
pack();
}
}</programlisting>
</answer>
</qandaentry>
</qandadiv>
</qandaset>
</section>
<section xml:id="jdbcExceptions">
<title>Handling possible exceptions</title>
<para>Our current code lacks any kind of error handling: Exceptions
will not be caught at all and invariably lead to program termination.
This is of course inadequate regarding professional software. In case
of problems we have to:</para>
<itemizedlist>
<listitem>
<para>Gracefully recover or shut down our application. We may for
example show a pop up window <quote>Terminating due to an internal
error</quote>.</para>
</listitem>
<listitem>
<para>Enable the customer to supply the development team with
helpful information. The user may for example be asked to submit a
log file in case of errors.</para>
</listitem>
</itemizedlist>
<para>In addition the solution
<classname>sda.jdbc.intro.v01.InsertPerson</classname> contains an
ugly mix of GUI components and database related code. We take a first
step to decouple these two distinct concerns:</para>
<qandaset defaultlabel="qanda" xml:id="quandaentry_DbLayer">
<title>Handling the database layer</title>
<qandadiv>
<qandaentry>
<question>
<para>Implement a class <code>PersistenceHandler</code> to be
later used as a component of our next step GUI application
prototype. This class should have the following
methods:</para>
<programlisting language="none">...
/**
* Handle database communication. There are two
* distinct internal states <q>disconnected</q> and <q>connected</q>, see
* {@link #isConnected()}. These two states may be toggled by invoking
* {@link #connect()} and {@link #disconnect()} respectively.
*
* The following snippet illustrates the intended usage:
* <pre> public static void main(String[] args) {
final PersistenceHandler ph = new PersistenceHandler();
if (ph.connect()) {
if (!ph.add("Jim", "jim@foo.com")) {
System.err.println("Insert Error:" + ph.getErrorMessage());
}
} else {
System.err.println("Connect error:" + ph.getErrorMessage());
}
}</pre>
*
* @author goik
*/
public class PersistenceHandler {
...
/**
* Instance in <q>disconnected</q> state. See {@link #isConnected()}
*/
public PersistenceHandler() {/* only present here to supply Javadoc comment */}
/**
* Inserting a (name, email) record into the database server. In case of
* errors corresponding messages may subsequently be retrieved by calling
* {@link #getErrorMessage()}.
*
* <dt><b>Precondition:</b></dt> <dd>must be in
* <q>connected</q> state, see {@link #isConnected()}</dd>
*
* @param name
* A person's name
* @param email
* A person's email address
*
* @return true if the current data record has been successfully inserted
* into the database server. false in case of error(s).
*/
public boolean add(final String name, final String email){
...
}
/**
* Retrieving error messages in case a call to {@link #add(String, String)},
* {@link #connect()}, or {@link #disconnect()} yields an error.
*
* @return the error explanation corresponding to the latest failed
* operation, null if no error yet occurred.
*/
public String getErrorMessage() {
return ...;
}
/**
* Open a connection to a database server.
*
* <dt><b>Precondition:</b><dd>
* <dd>must be in <q>disconnected</q> state, see {@link #isConnected()}</dd>
*
* <dt><b>Precondition:</b><dd>
* <dd>The following properties must be set:
* <pre>PersistenceHandler.jdbcUrl=jdbc:mysql://localhost:3306/hdm
PersistenceHandler.password=XYZ
PersistenceHandler.username=foo</pre>
* </dd>
*
* @return true if connecting was successful
*/
public boolean connect () {
...
}
/**
* Close a connection to a database server and clean up JDBC related resources
*
* Error messages in case of failure may subsequently be retrieved by
* calling {@link #getErrorMessage()}.
*
* <dt><b>Precondition:</b></dt>
* <dd>must be in <q>connected</q> state, see {@link #isConnected()}</dd>
*
* @return true if disconnecting was successful, false in case error(s) occur.
*/
public boolean disconnect() {
...
}
/**
* An instance can either be in <q>connected</q> or <q>disconnected</q> state. The
* state can be toggled by invoking {@link #connect()} or
* {@link #disconnect()} respectively.
*
* @return true if connected, false otherwise
*/
public boolean isConnected() {
return ...;
}
}</programlisting>
<para>Notice the two internal states
<quote>disconnected</quote> and
<quote>connected</quote>:</para>
<figure xml:id="figPersistenceHandlerStates">
<title>Possible states and transitions for instances of
<code>PersistenceHandler</code>.</title>
<mediaobject>
<imageobject>
<imagedata fileref="Ref/Fig/persistHandlerStates.fig"/>
</imageobject>
</mediaobject>
</figure>
<para>According to the above documentation a newly created
<code>PersistenceHandler</code> instance should be in
disconnected state. As being shown in the <link
linkend="gloss_Java"><trademark>Java</trademark></link> class
description you may test your implementation without any GUI
code. If you are already familiar with unit testing this might
be a good start as well.</para>
</question>
<answer>
<para>We show a possible implementation of
<classname>sda.jdbc.intro.v1.PersistenceHandler</classname>:</para>
<programlisting language="none">package sda.jdbc.intro.v1;
...
public class PersistenceHandler {
Connection conn = null;
Statement stmt = null;
String errorMessage = null;
/**
* New instances are in <q>disconnected</q> state. See {@link #isConnected()}
*/
public PersistenceHandler() {/* only present here to supply Javadoc comment */}
/**
* Inserting a (name, email) record into the database server. In case of
* errors corresponding messages may subsequently be retrieved by calling
* {@link #getErrorMessage()}.
*
* <dt><b>Precondition:</b></dt> <dd>must be in
* <q>connected</q> state, see {@link #isConnected()}</dd>
*
* @param name
* A person's name
* @param email
* A person's email address
*
* @return true if the current data record has been successfully inserted
* into the database server. false in case of error(s).
*/
public boolean add(final String name, final String email){
final String sql = "INSERT INTO Person VALUES('" + name + "', '" +
email + "')";
try {
stmt.executeUpdate(sql);
return true;
} catch (SQLException e) {
errorMessage = "Unable to execute '" + sql + "': '" + e.getMessage() + "'";
return false;
}
}
/**
* Retrieving error messages in case a call to {@link #add(String, String)},
* {@link #connect()}, or {@link #disconnect()} yields an error.
*
* @return the error explanation corresponding to the latest failed
* operation, null if no error yet occurred.
*/
public String getErrorMessage() {
return errorMessage;
}
/**
* Open a connection to a database server.
*
* <dt><b>Precondition:</b><dd>
* <dd>must be in <q>disconnected</q> state, see {@link #isConnected()}</dd>
*
* <dt><b>Precondition:</b><dd>
* <dd>The following properties must be set:
* <pre>PersistenceHandler.jdbcUrl=jdbc:mysql://localhost:3306/hdm
PersistenceHandler.password=XYZ
PersistenceHandler.username=foo</pre>
* </dd>
*
* @return true if connecting was successful
*/
public boolean connect () {
try {
conn = DriverManager.getConnection(
DbProps.getString("PersistenceHandler.jdbcUrl"),
DbProps.getString("PersistenceHandler.username"),
DbProps.getString("PersistenceHandler.password"));
try {
stmt = conn.createStatement();
return true;
} catch (SQLException e) {
errorMessage = "Connection opened but Statement creation failed:\"" + e.getMessage() + "\".";
try {
conn.close();
} catch (SQLException ee) {
errorMessage += "Closing connection failed:\"" + e.getMessage() + "\".";
}
conn = null;
}
} catch (SQLException e) {
errorMessage = "Unable to open connection:\"" + e.getMessage() + "\".";
}
return false;
}
/**
* Close a connection to a database server and clean up JDBC related resources
*
* Error messages in case of failure may subsequently be retrieved by
* calling {@link #getErrorMessage()}.
*
* <dt><b>Precondition:</b></dt>
* <dd>must be in <q>connected</q> state, see {@link #isConnected()}</dd>
*
* @return true if disconnecting was successful, false in case error(s) occur.
*/
public boolean disconnect() {
boolean resultStatus = true;
final StringBuffer messageCollector = new StringBuffer();
try {
stmt.close();
} catch (SQLException e) {
resultStatus = false;
messageCollector.append("Unable to close Statement:\"" + e.getMessage() + "\".");
}
stmt = null;
try {
conn.close();
} catch (SQLException e) {
resultStatus = false;
messageCollector.append("Unable to close connection:\"" + e.getMessage() + "\".");
}
conn = null;
if (!resultStatus) {
errorMessage = messageCollector.toString();
}
return resultStatus;
}
/**
* An instance can either be in <q>connected</q> or <q>disconnected</q> state. The
* state can be toggled by invoking {@link #connect()} or
* {@link #disconnect()} respectively.
*
* @return true if connected, false otherwise
*/
public boolean isConnected() {
return null != conn;
}
}</programlisting>
</answer>
</qandaentry>
</qandadiv>
</qandaset>
<para>We may now complete the next enhancement step of our GUI
database client.</para>
<qandaset defaultlabel="qanda" xml:id="exerciseGuiWriteTakeTwo">
<title>Connection on user action</title>
<qandadiv>
<qandaentry>
<question>
<label>An application writing records to a database
server</label>
<para>Our aim is to enhance the first GUI prototype being
described in <xref linkend="simpleInsertGui"/>. The
application shall start being disconnected from the database
server. Prior to entering data the user shall be guided to
open a connection. The following video illustrates the desired
user interface:</para>
<figure xml:id="figureDataInsert2">
<title>A GUI frontend for adding personal data to a
server.</title>
<mediaobject>
<videoobject>
<videodata fileref="Ref/Video/dataInsert.mp4"/>
</videoobject>
</mediaobject>
</figure>
<para>In case a user closes the main window while still being
connected a disconnect from the database server shall be
enforced. For this purpose we must handle the event when the
user clicks on the closing button within the window
decoration. An exit handler method is being required to
terminate a potentially open database connection.</para>
</question>
<answer>
<para>Our implementation uses the class
<classname>sda.jdbc.intro.v1.PersistenceHandler</classname>
for handling all database communication. The GUI needs to
visualize the two different states <quote>disconnected</quote>
and <quote>connected</quote>. In <quote>disconnected</quote>
state the whole input pane for entering datasets and clicking
the <quote>Insert</quote> button is locked. So the user is
forced to actively open a database connection.</para>
<para>Notice also the
<classname>java.awt.event.WindowAdapter</classname>
implementation being executed when closing the application's
main window. The
<methodname>java.awt.event.WindowAdapter.windowClosing(java.awt.event.WindowEvent)</methodname>
method disconnects any existing database connection thus
freeing resources.</para>
<programlisting language="none">package sda.jdbc.intro.v1;
import ...
public class InsertPerson extends JFrame {
private static final long serialVersionUID = 6815975741605247675L;
final PersistenceHandler persistenceHandler = new PersistenceHandler();
final JTextField nameField = new JTextField(15),
emailField = new JTextField(20);
final JButton toggleConnectButton = new JButton(),
insertButton = new JButton("Insert");
final JPanel databaseFieldPanel = new JPanel();
private void setGuiConnectionState(final boolean state) {
if (state) {
toggleConnectButton.setText("Disconnect");
} else {
toggleConnectButton.setText("Connect");
}
for (final Component c: databaseFieldPanel.getComponents()){
c.setEnabled(state);
}
}
public static void main(String[] args) throws SQLException {
InsertPerson app = new InsertPerson();
app.setVisible(true);
}
public InsertPerson (){
super ("Add a person's data");
setSize(500, 500);
addWindowListener(new WindowAdapter() {
// In case a user closes our application window while still being connected
// we have to close the database connection.
@Override
public void windowClosing(WindowEvent e) {
super.windowClosing(e);
if (persistenceHandler.isConnected() && !persistenceHandler.disconnect()) {
System.exit(1);
} else {
System.exit(0);
}
});
Box top = Box.createHorizontalBox();
add(top, BorderLayout.NORTH);
top.add(toggleConnectButton);
toggleConnectButton.addActionListener(new ActionListener() {
@Override
public void actionPerformed(ActionEvent e) {
if (persistenceHandler.isConnected()) {
if (persistenceHandler.disconnect()){
setGuiConnectionState(false);
} else {
JOptionPane.showMessageDialog(null, persistenceHandler.getErrorMessage());
}
} else {
if (persistenceHandler.connect()){
setGuiConnectionState(true);
} else {
JOptionPane.showMessageDialog(null, persistenceHandler.getErrorMessage());
}
}
}
});
databaseFieldPanel.setLayout(new GridLayout(0,2));
add(databaseFieldPanel);
databaseFieldPanel.add(new JLabel("Name:"));
databaseFieldPanel.add(nameField);
databaseFieldPanel.add(new JLabel("E-mail:"));
databaseFieldPanel.add(emailField);
insertButton.addActionListener(new ActionListener() {
@Override
public void actionPerformed(ActionEvent e) {
if (persistenceHandler.add(nameField.getText(), emailField.getText())) {
nameField.setText("");
emailField.setText("");
JOptionPane.showMessageDialog(null, "Succesfully inserted dataset");
} else {
JOptionPane.showMessageDialog(null, persistenceHandler.getErrorMessage());
}
}
});
databaseFieldPanel.add(Box.createGlue());
databaseFieldPanel.add(insertButton);
setGuiConnectionState(false);
pack();
}
}</programlisting>
</answer>
</qandaentry>
</qandadiv>
</qandaset>
</section>
<section xml:id="jdbcSecurity">
<title><trademark
xlink:href="http://electronics.zibb.com/trademark/jdbc/29545026">JDBC</trademark>
and security</title>
<section xml:id="jdbcSecurityNetwork">
<title>Network sniffing</title>
<para>Sniffing <trademark
xlink:href="http://electronics.zibb.com/trademark/jdbc/29545026">JDBC</trademark>
network traffic is one possibility for intruders to compromise
database applications. This requires physical access to either
of:</para>
<itemizedlist>
<listitem>
<para>Server host</para>
</listitem>
<listitem>
<para>Client host</para>
</listitem>
<listitem>
<para>intermediate hub, switch or router.</para>
</listitem>
</itemizedlist>
<figure xml:id="figJdbcSniffing">
<title>Sniffing a <trademark
xlink:href="http://electronics.zibb.com/trademark/jdbc/29545026">JDBC</trademark>
connection by an intruder.</title>
<mediaobject>
<imageobject>
<imagedata fileref="Ref/Fig/jdbcSniffing.fig"/>
</imageobject>
</mediaobject>
</figure>
<para>We demonstrate a possible attack by analyzing the network
traffic between our application shown in <xref
linkend="figJdbcSimpleWrite"/> and the <productname
xlink:href="http://www.mysql.com">Mysql</productname> database
server. Prior to starting the application we set up <productname
xlink:href="http://www.wireshark.org">Wireshark</productname> for
filtered capturing:</para>
<itemizedlist>
<listitem>
<para>Connecting to the <varname>loopback</varname> (lo)
interface only. This is sufficient since our client connects to
<varname>localhost</varname>.</para>
</listitem>
<listitem>
<para>Filtering packets if not of type <acronym
xlink:href="http://en.wikipedia.org/wiki/Transmission_Control_Protocol">TCP</acronym>
and having port number 3306</para>
</listitem>
</itemizedlist>
<para>This yields the following capture being shortened for the sake
of brevity:</para>
<programlisting language="none">[...
5.5.24-0ubuntu0.12.04.1.%...X*e?I1ZQ...................e,F[yoA5$T[N.mysql_native_password.
A...........!.......................hdmuser <co xml:id="tcpCaptureUsername"/>......U.>S.%..~h...!.xhdm............j..../*
... INSERT INTO Person VALUES('Jim', 'jim@foo.org') <co
xml:id="tcpCaptureSqlInsert"/>6...
.&.#23000Duplicate entry 'jim@foo.org' for key 'email' <co
xml:id="tcpCaptureErrmsg"/></programlisting>
<calloutlist>
<callout arearefs="tcpCaptureUsername">
<para>The <varname>username</varname> initiating the connection
to the database server.</para>
</callout>
<callout arearefs="tcpCaptureSqlInsert">
<para>The <code>INSERT ...</code> statement.</para>
</callout>
<callout arearefs="tcpCaptureErrmsg">
<para>The resulting error message being sent back to the
client.</para>
</callout>
</calloutlist>
<para>Something seems to be missing here: The user's password. Our
code in <xref linkend="figJdbcSimpleWrite"/> contains the password
<quote><varname>XYZ</varname></quote> in clear text. But even using
the search function of <productname
xlink:href="http://www.wireshark.org">Wireshark</productname> does
not show any such string within the above capture. The <productname
xlink:href="http://www.mysql.com">Mysql</productname> documentation
however <link
xlink:href="http://dev.mysql.com/doc/refman/5.0/en/security-against-attack.html">reveals</link>
that everything but the password is transmitted in clear text. So
all we might identify is a hash of <code>XYZ</code>.</para>
<para>So regarding our (current) <productname
xlink:href="http://www.mysql.com">Mysql</productname> implementation
the impact of this attack type is somewhat limited but still severe:
All data being transmitted between client and server may be
disclosed. This typically comprises sensible data as well. Possible
solutions:</para>
<itemizedlist>
<listitem>
<para>Create an encrypted tunnel between client and server like
e.g. <link
xlink:href="http://www.debianadmin.com/howto-use-ssh-local-and-remote-port-forwarding.html">ssh
port forwarding</link> or <link
xlink:href="http://de.wikipedia.org/wiki/Virtual_Private_Network">VPN</link>.</para>
</listitem>
<listitem>
<para>Many database vendors <link
xlink:href="http://dev.mysql.com/doc/refman/5.1/de/connector-j-reference-using-ssl.html">supply
SSL</link> or similar <trademark
xlink:href="http://electronics.zibb.com/trademark/jdbc/29545026">JDBC</trademark>
protocol encryption extensions. This requires additional
configuration procedures like setting up server side
certificates. Moreover similar to the http/https protocols
encryption generally slows down data traffic.</para>
</listitem>
</itemizedlist>
<para>Of course this is only relevant if the transport layer is
considered to be insecure. If both server and client reside within
the same trusted infrastructure no action has to be taken. We also
note that this kind of problem is not limited to <trademark
xlink:href="http://electronics.zibb.com/trademark/jdbc/29545026">JDBC</trademark>.
In fact all protocols lacking encryption are subject to this type of
attack.</para>
</section>
<section xml:id="sqlInjection">
<title>SQL injection</title>
<para>Before diving into technical details we shed some light on the
possible impact of this common attack type being described in this
chapter. Our example is the well known Heartland Payment Systems
data breach:</para>
<figure xml:id="figHeartlandSecurityBreach">
<title>Summary about possible SQL injection impact based on the
Heartland security breach</title>
<mediaobject>
<imageobject>
<imagedata fileref="Ref/Fig/heartland.fig"/>
</imageobject>
</mediaobject>
</figure>
<para>Why should we be concerned with SQL injection? In the
introduction of <xref linkend="bib_Clarke09"/> a compelling argument
is being given:</para>
<blockquote>
<para>Many people say they know what SQL injection is, but all
they have heard about or experienced are trivial examples. SQL
injection is one of the most devastating vulnerabilities to impact
a business, as it can lead to exposure of all of the sensitive
information stored in an application's database, including handy
information such as usernames, passwords, names, addresses, phone
numbers, and credit card details.</para>
</blockquote>
<para>In this lecture due to limited resources we only deal with
trivial examples mentioned above. One possible way SQL injection
attacks work is by inserting SQL code into fields being designed for
end user input:</para>
<figure xml:id="figSqlInject">
<title>SQL injection triggered by ordinary user input.</title>
<mediaobject>
<imageobject>
<imagedata fileref="Ref/Fig/sqlinject.fig"/>
</imageobject>
</mediaobject>
</figure>
<qandaset defaultlabel="qanda" xml:id="sqlInjectDropTable">
<title>Attack from the dark side</title>
<qandadiv>
<qandaentry>
<question>
<para>Use the application from <xref
linkend="exerciseGuiWriteTakeTwo"/> and <xref
linkend="figSqlInject"/> to launch a SQL injection attack.
We provide some hints:</para>
<orderedlist>
<listitem>
<para>The <productname
xlink:href="http://www.mysql.com">Mysql</productname>
<trademark
xlink:href="http://electronics.zibb.com/trademark/jdbc/29545026">JDBC</trademark>
driver implementation already provides precautions to
hamper SQL injection attacks. In its default
configuration a sequence of SQL commands separated by
semicolons (<quote>;</quote>) will not be executed but
flagged as a SQL syntax error. We take an
example:</para>
<programlisting language="none">INSERT INTO Person VALUES (...);DROP TABLE Person</programlisting>
<para>In order to execute these so called multi user
queries we explicitly have to enable a <productname
xlink:href="http://www.mysql.com">Mysql</productname>
property. This may be achieved by extending our
<trademark
xlink:href="http://electronics.zibb.com/trademark/jdbc/29545026">JDBC</trademark>
URL:</para>
<programlisting language="none">jdbc:mysql://localhost:3306/hdm?<emphasis
role="bold">allowMultiQueries=true</emphasis></programlisting>
<para>The <productname
xlink:href="http://www.mysql.com">Mysql</productname>
manual <link
xlink:href="http://dev.mysql.com/doc/refman/5.1/en/connector-j-reference-configuration-properties.html">contains
</link>a remark regarding this parameter:</para>
<remark>Notice that this has the potential for SQL
injection if using plain java.sql.Statements and your
code doesn't sanitize input correctly.</remark>
<para>In other words: You have been warned!</para>
</listitem>
<listitem>
<para>You may now use either of the two input fields
<quote>name</quote> or <quote>email</quote> to inject
arbitrary SQL code.</para>
</listitem>
</orderedlist>
</question>
<answer>
<para>We construct a suitable string being injected to drop
our <code>Person</code> table:</para>
<programlisting language="none">Jim', 'jim@c.com');DROP TABLE Person;INSERT INTO Person VALUES('Joe</programlisting>
<para>This being entered into the name field kills our
<code>Table</code> relation effectively. As the error
message shows two INSERT statements are separated by a DROP
TABLE statement. So after executing the first INSERT our
database server drops the whole table. At last the second
INSERT statement fails giving rise to an error message no
end user will ever understand:</para>
<figure xml:id="figSqlInjectDropPerson">
<title>Dropping the <code>Person</code> table by SQL
injection</title>
<mediaobject>
<imageobject>
<imagedata fileref="Ref/Screen/sqlInject.screen.png"/>
</imageobject>
</mediaobject>
</figure>
<para>According to the message text the table
<code>Person</code> gets dropped as expected. Thus the
subsequent (second) <code>INSERT</code> action is bound to
fail.</para>
<para>In practice this result my be avoided. The database
user will (hopefully!) not have sufficient permissions to
drop the whole table. Malicious modifications by INSERT,
UPDATE or DELETE statements are still possible.</para>
</answer>
</qandaentry>
</qandadiv>
</qandaset>
</section>
<section xml:id="sanitizeUserInput">
<title>Sanitizing user input</title>
<para>There are at least two general ways to deal with the
disastrous result of <xref linkend="sqlInjectDropTable"/>:</para>
<itemizedlist>
<listitem>
<para>Keep the database server from interpreting user input
completely. This is probably the best way and will be discussed
in <xref linkend="sectPreparedStatements"/>.</para>
</listitem>
<listitem>
<para>Let the application check and process user input.
Dangerous user input may be modified prior to being embedded in
SQL statements or being rejected completely.</para>
</listitem>
</itemizedlist>
<para>The first method is definitely superior in most cases. There
are however cases where the restrictions being implied are too
severe. We may for example choose dynamically which tables shall be
accessed. So an SQL statement's structure rather than just its
predicates is affected by user input. There are at least two
standard procedures dealing with this problem:</para>
<glosslist>
<glossentry>
<glossterm>Input Filtering</glossterm>
<glossdef>
<para>In the simplest case we check a user's input by regular
expressions. An example is an input field in a login window
representing a system user name. Legal input may allows
letters and digits only. Special characters, whitespace etc.
are typically prohibited. The input does have a minimum length
of one character. A maximum length may be imposed as well. So
we may choose the regular expression <code>[A-Za-z0-9]+</code>
to check valid user names.</para>
</glossdef>
</glossentry>
<glossentry>
<glossterm><foreignphrase>Whitelisting</foreignphrase></glossterm>
<glossdef>
<para>In many cases Input fields only allow a restricted set
of values. Consider an input field for names of planets. An
application may keep a dictionary table to validate user
input:</para>
<informaltable border="1">
<col width="10%"/>
<col width="5%"/>
<tr>
<td>Mercury</td>
<td>1</td>
</tr>
<tr>
<td>Venus</td>
<td>2</td>
</tr>
<tr>
<td>Earth</td>
<td>3</td>
</tr>
<tr>
<td>...</td>
<td>...</td>
</tr>
<tr>
<td>Neptune</td>
<td>9</td>
</tr>
<tr>
<td><emphasis role="bold">Default:</emphasis></td>
<td><emphasis role="bold">0</emphasis></td>
</tr>
</informaltable>
<para>So if a user enters a valid planet name a corresponding
number representing this particular planet will be sent to the
database. If the user enters an invalid string an error
message may be raised.</para>
<para>In a GUI in many situations this may be better
accomplished by presenting the list of planets to choose from.
In this case a user has no chance to enter invalid or even
malicious code.</para>
</glossdef>
</glossentry>
</glosslist>
<para>So we have an <quote>interceptor</quote> sitting between user
input fields and SQL generating code:</para>
<figure xml:id="figInputFiltering">
<title>Validating user input prior to dynamically composing SQL
statements.</title>
<mediaobject>
<imageobject>
<imagedata fileref="Ref/Fig/filtering.fig"/>
</imageobject>
</mediaobject>
</figure>
<qandaset defaultlabel="qanda" xml:id="quandaentry_RegexpUse">
<title>Using regular expressions in <link
linkend="gloss_Java"><trademark>Java</trademark></link></title>
<qandadiv>
<qandaentry>
<question>
<para>This exercise is a preparation for <xref
linkend="exercisefilterUserInput"/>. The aim is to deal with
regular expressions and to use them in <link
linkend="gloss_Java"><trademark>Java</trademark></link>. If
you don't know yet about regular expressions / pattern
matching you may want to read either of:</para>
<itemizedlist>
<listitem>
<para><link
xlink:href="http://www.aivosto.com/vbtips/regex.html">Regular
expressions - An introduction</link></para>
</listitem>
<listitem>
<para><link
xlink:href="http://www.codeproject.com/Articles/939/An-Introduction-to-Regular-Expressions">An
Introduction to Regular Expressions</link></para>
</listitem>
<listitem>
<para><link
xlink:href="http://www.regular-expressions.info/tutorial.html">Regular
Expression Tutorial</link></para>
</listitem>
</itemizedlist>
<para>Complete the implementation of the following
skeleton:</para>
<programlisting language="none">...
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public static void main(String[] args) {
final String [] wordList = new String [] {"Eric", "126653BBb", "_login","some text"};
final String [] regexpList = new String[] {"[A-K].*", "[^0-9]+.*", "_[a-z]+", ""};
for (final String word: wordList) {
for (final String regexp: regexpList) {
testMatch(word, regexp);
}
}
}
/**
* Matching a given word by a regular expression. A log message is being
* written to stdout.
*
* Hint: The implementation is based on the explanation being given in the
* introduction to {@link Pattern}
*
* @param word This string will be matched by the subsequent argument.
* @param regexp The regular expression tested to match the previous argument.
* @return true if regexp matches word, false otherwise.
*/
public static boolean testMatch(final String word, final String regexp) {
.../* to be implemented by <emphasis role="bold">**YOU**</emphasis> */
}</programlisting>
<para>As being noted in the <link
linkend="gloss_Java"><trademark>Java</trademark></link>
above you may want to read the documentation of class
<classname>java.util.regex.Pattern</classname>. The intended
output of the above application is:</para>
<programlisting language="none">The expression '[A-K].*' matches 'Eric'
The expression '[^0-9]+.*' ...
...</programlisting>
</question>
<answer>
<para>A possible implementation is given by
<classname>sda.regexp.RegexpPrimer</classname>.</para>
</answer>
</qandaentry>
</qandadiv>
</qandaset>
<qandaset defaultlabel="qanda" xml:id="exercisefilterUserInput">
<title>Input validation by regular expressions</title>
<qandadiv>
<qandaentry>
<question>
<para>The application of <xref
linkend="sqlInjectDropTable"/> proved to be vulnerable to
SQL injection. Sanitize the two user input field's values to
prevent such behaviour.</para>
<itemizedlist>
<listitem>
<para>Find appropriate regular expressions to check both
username and email. Some hints:</para>
<glosslist>
<glossentry>
<glossterm>username</glossterm>
<glossdef>
<para>Regarding SQL injection the <quote>;</quote>
character is among the most critical. You may want
to exclude certain special characters. This doesn't
harm since their presence in a user's name is likely
to be a typo rather then any sensitive input.</para>
</glossdef>
</glossentry>
<glossentry>
<glossterm>email</glossterm>
<glossdef>
<para>There are tons of <quote>ultimate</quote>
regular expressions available to check email
addresses. Remember that rather avoiding
<quote>wrong</quote> email addresses the present
task is to avoid SQL injection. So find a reasonable
one which may be too permissive regarding RFC email
syntax rules but sufficient to secure your
application.</para>
<para>A concise definition of an email's syntax is
being given in <link
xlink:href="http://tools.ietf.org/html/rfc5322#section-3.4.1">RFC5322</link>.
Its implementation is beyond scope of the current
lecture. Moreover it is questionable whether E-mail
clients and mail transfer agents implement strict
RFC compliance.</para>
</glossdef>
</glossentry>
</glosslist>
<para>Both regular expressions must cover the whole user
input from the beginning to the end. This can be
achieved by using <code>^ ... $</code>.</para>
</listitem>
<listitem>
<para>The <link
linkend="gloss_Java"><trademark>Java</trademark></link>
standard class
<classname>javax.swing.InputVerifier</classname> may
help you validating user input.</para>
</listitem>
<listitem>
<para>The following screenshot may provide an idea for
GUI realization and user interaction in case of errors.
Of course the submit button's action should be disabled
in case of erroneous input. The user should receive a
helpful error message instead.</para>
<figure xml:id="figInsertValidate">
<title>Error message being presented to the
user.</title>
<mediaobject>
<imageobject>
<imagedata fileref="Ref/Screen/insertValidate.screen.png"/>
</imageobject>
<caption>
<para>In the current example the trailing
<quote>;</quote> within the E-Mail field is
invalid.</para>
</caption>
</mediaobject>
</figure>
</listitem>
</itemizedlist>
</question>
<answer>
<para>Extending
<classname>javax.swing.InputVerifier</classname> allows us
to build a generic class to filter user text input by
arbitrary regular expressions:</para>
<programlisting language="none">package sda.jdbc.intro.v1.sanitize;
...
public class RegexpVerifier extends InputVerifier {
final Pattern syntaxPattern;
final JLabel validationLabel;
private boolean inputValid = false;
private final String errMsg;
...
public RegexpVerifier (final String regex, final JLabel validationLabel, final String errMsg) {
this.validationLabel = validationLabel;
this.errMsg = errMsg;
syntaxPattern = Pattern.compile(regex);
}
@Override
public boolean verify(JComponent input) {
if (input instanceof JTextField) {
final String userInput = ((JTextField) input).getText();
if (syntaxPattern.matcher(userInput).find()) {
validationLabel.setText("");
inputValid = true;
} else {
validationLabel.setText(errMsg);
inputValid = false;
}
}
return inputValid;
}
public boolean inputIsValid () {
return inputValid;
}
}</programlisting>
<para>Instances of
<classname>sda.jdbc.intro.v1.sanitize.RegexpVerifier</classname>
<coref linkend="emailVerifier"/> <coref
linkend="nameVerifier"/> may now be used to validate our two
input data fields <coref linkend="setNameValidation"/>
<coref linkend="setEmailValidation"/>. We put emphasis on
the changes with respect to
<classname>sda.jdbc.intro.v1.InsertPerson</classname>:</para>
<programlisting language="none">package sda.jdbc.intro.v1.sanitize;
...
public class InsertPerson extends JFrame {
final JTextField nameField = new JTextField(15);
final JLabel nameFieldValidationLabel <co xml:id="nameVerifier"/> = new JLabel();
final RegexpVerifier nameFieldVerifier = new RegexpVerifier(
"^[^;'\"]+$",
nameFieldValidationLabel,
"No special characters");
final JTextField emailField = new JTextField(20);
final JLabel emailFieldValidationLabel <co xml:id="emailVerifier"/> = new JLabel();
final RegexpVerifier emailFieldVerifier =
new RegexpVerifier("^[\\w\\-\\.\\_]+@[\\w\\-\\.]*[a-zA-Z]{2,4}$",
emailFieldValidationLabel,
"email not valid");
...
public static void main(String[] args) throws SQLException {
InsertPerson app = new InsertPerson();
app.setVisible(true);
}
public InsertPerson (){
...
databaseFieldPanel.add(nameField);
<emphasis role="bold">nameFieldValidationLabel.setForeground(Color.RED);
databaseFieldPanel.add(nameFieldValidationLabel);
nameField.setInputVerifier(nameFieldVerifier);</emphasis> <co
xml:id="setNameValidation"/>
databaseFieldPanel.add(new JLabel("E-mail:"));
databaseFieldPanel.add(emailField);
<emphasis role="bold">databaseFieldPanel.add(emailFieldValidationLabel);
emailFieldValidationLabel.setForeground(Color.RED);
emailField.setInputVerifier(emailFieldVerifier);</emphasis> <co
xml:id="setEmailValidation"/>
insertButton.addActionListener(new ActionListener() {
@Override
public void actionPerformed(ActionEvent e) {
<emphasis role="bold">if (!nameFieldVerifier.inputIsValid() || !emailFieldVerifier.inputIsValid()) {
JOptionPane.showMessageDialog(null, "Invalid input value(s)");
}</emphasis> else {
...</programlisting>
</answer>
</qandaentry>
</qandadiv>
</qandaset>
</section>
<section xml:id="sectPreparedStatements">
<title><classname>java.sql.PreparedStatement</classname>
objects</title>
<para>Sanitizing user input is an essential means to secure an
application. The <trademark
xlink:href="http://electronics.zibb.com/trademark/jdbc/29545026">JDBC</trademark>
standard however provides a mechanism being superior regarding the
purpose of protecting applications against SQL injection attacks. We
shed some light on our current mechanism sending SQL statements to a
database server:</para>
<figure xml:id="sqlTransport">
<title>SQL statements in <link
linkend="gloss_Java"><trademark>Java</trademark></link>
applications get parsed at the database server</title>
<mediaobject>
<imageobject>
<imagedata fileref="Ref/Fig/sqlTransport.fig"/>
</imageobject>
</mediaobject>
</figure>
<para>This architecture raises two questions:</para>
<orderedlist>
<listitem>
<para>What happens in case identical SQL statements are executed
repeatedly? This may happen inside a loop when thousands of
records with identical structure are being sent to a
database.</para>
</listitem>
<listitem>
<para>Is this architecture adequate with respect to security
concerns?</para>
</listitem>
</orderedlist>
<para>The first question is related to performance: Parsing
statements being identical despite the properties being contained
within is a waste of resources. We consider the transfer of records
between different databases:</para>
<programlisting language="none">INSERT INTO Person VALUES ('Jim', 'jim@q.org')
INSERT INTO Person VALUES ('Eve', 'eve@y.org')
INSERT INTO Person VALUES ('Pete', 'p@rr.com')
...</programlisting>
<para>In this case it does not make sense to repeatedly parse
identical SQL statements. Using single <code>INSERT</code>
statements with multiple data records may not be an option when the
number of records grows.</para>
<para>The second question is related to our current security topic:
The database server's interpreter my be so <quote>kind</quote> to
interpret an attacker's malicious code as well.</para>
<para>Both topics are being addressed by
<classname>java.sql.PreparedStatement</classname> objects. Basically
these objects allow for separation of an SQL statements structure
from parameter values contained within. The scenario given in <xref
linkend="sqlTransport"/> may be implemented as:</para>
<figure xml:id="sqlTransportPrepare">
<title>Using <classname>java.sql.PreparedStatement</classname>
objects.</title>
<mediaobject>
<imageobject>
<imagedata fileref="Ref/Fig/sqlTransportPrepare.fig"/>
</imageobject>
</mediaobject>
</figure>
<para>Prepared statements are an example for parameterized SQL
statements which exist in various programming languages. When using
<classname>java.sql.PreparedStatement</classname> instances we
actually have three distinct phases:</para>
<orderedlist>
<listitem>
<para xml:id="exerciseGuiWritePrepared">Creating an instance of
<classname>java.sql.PreparedStatement</classname>. The SQL
statement possibly containing place holders gets parsed.</para>
</listitem>
<listitem>
<para>Setting all placeholder values. This does not involve any
further SQL syntax parsing.</para>
</listitem>
<listitem>
<para>Execute the statement.</para>
</listitem>
</orderedlist>
<para>Steps 2. and 3. may be repeated as often as desired without
any re-parsing of SQL statements thus saving resources on the
database server side.</para>
<para>Our introductory toy application <xref
linkend="figJdbcSimpleWrite"/> may be rewritten using
<classname>java.sql.PreparedStatement</classname> objects:</para>
<programlisting language="none">sda.jdbc.intro.v1;
...
public class SimpleInsert {
public static void main(String[] args) throws SQLException {
final Connection conn = DriverManager.getConnection (...
// Step 2: Create a PreparedStatement instance
final PreparedStatement pStmt = conn.prepareStatement(
"INSERT INTO Person VALUES(<emphasis role="bold">?, ?</emphasis>)");<co
xml:id="listPrepCreate"/>
// Step 3a: Fill in desired attribute values
pStmt.setString(1, "Jim");<co xml:id="listPrepSet1"/>
pStmt.setString(2, "jim@foo.org");<co xml:id="listPrepSet2"/>
// Step 3b: Execute the desired INSERT
final int updateCount = pStmt.executeUpdate();<co xml:id="listPrepExec"/>
// Step 4: Give feedback to the enduser
System.out.println("Successfully inserted " + updateCount + " dataset(s)");
}
}</programlisting>
<calloutlist>
<callout arearefs="listPrepCreate">
<para>An instance of
<classname>java.sql.PreparedStatement</classname> is being
created. Notice the two question marks representing two place
holders for string values to be inserted in the next
step.</para>
</callout>
<callout arearefs="listPrepSet1 listPrepSet2">
<para>Fill in the two placeholder values being defined at <coref
linkend="listPrepCreate"/>.</para>
<caution>
<para>Since half the world of programming folks will index a
list of n elements starting from 0 to n-1, <trademark
xlink:href="http://electronics.zibb.com/trademark/jdbc/29545026">JDBC</trademark>
apparently counts from 1 to n. Working with <trademark
xlink:href="http://electronics.zibb.com/trademark/jdbc/29545026">JDBC</trademark>
would have been too easy otherwise.</para>
</caution>
</callout>
<callout arearefs="listPrepExec">
<para>Execute the beast! Notice the empty parameter list. No SQL
is required since we already prepared it in <coref
linkend="listPrepCreate"/>.</para>
</callout>
</calloutlist>
<para>The problem of SQL injection disappears completely when using
<classname>java.sql.PreparedStatement</classname> instances. An
attacker may safely enter offending strings like:</para>
<programlisting language="none">Jim', 'jim@c.com');DROP TABLE Person;INSERT INTO Person VALUES('Joe</programlisting>
<para>The above string will be taken <quote>as is</quote> and thus
simply becomes part of the database server's content.</para>
<qandaset defaultlabel="qanda" xml:id="exerciseSqlInjectPrepare">
<title>Prepared Statements to keep the barbarians at the
gate</title>
<qandadiv>
<qandaentry>
<question>
<para>In <xref linkend="sqlInjectDropTable"/> we found our
implementation in <xref linkend="exerciseGuiWriteTakeTwo"/>
to be vulnerable with respect to SQL injection. Rather than
sanitizing user input you shall use
<classname>java.sql.PreparedStatement</classname> objects to
secure the application.</para>
</question>
<answer>
<para>Due to our separation of GUI and persistence handling
we only need to re-implement
<classname>sda.jdbc.intro.sqlinject.PersistenceHandler</classname>.
We have to replace <classname>java.sql.Statement</classname>
by <classname>java.sql.PreparedStatement</classname>
instances. A possible implementation is
<classname>sda.jdbc.intro.v1.prepare.PersistenceHandler</classname>.
We may now safely enter offending strings like:</para>
<programlisting language="none">Jim', 'jim@c.com');DROP TABLE Person;INSERT INTO Person VALUES('Joe</programlisting>
<para>This time the input value is taken <quote>as
is</quote> and yields the following error message:</para>
<informalfigure>
<mediaobject>
<imageobject>
<imagedata fileref="Ref/Screen/sqlInjectPrepare.screen.png"/>
</imageobject>
</mediaobject>
</informalfigure>
<para>The offending string exceeds the length of the
attribute <code>name</code> within the database table
<code>Person</code>. We may enlarge this value to allow the
<code>INSERT</code> operation:</para>
<programlisting language="none">CREATE TABLE Person (
name char(<emphasis role="bold">80</emphasis>) <emphasis role="bold">-- a little bit longer --</emphasis>
,email CHAR(20) UNIQUE
);</programlisting>
</answer>
</qandaentry>
</qandadiv>
</qandaset>
<para>We may have followed the track of test-driven development. In
that case we would have written tests before actually implementing
our application. In the current lecture we will do this the other
way round in the following exercise. The idea is to assure software
quality when fixing bugs or extending an application.</para>
<para>The subsequent exercise requires the <productname
xlink:href="http://testng.org/doc/eclipse.html#eclipse-installation">TestNG</productname>
plugin for Eclipse to be installed. This should already be the case
both in the MI exercise classrooms and in the Virtualbox image
provided at <uri
xlink:href="ftp://mirror.mi.hdm-stuttgart.de/ubuntu/VirtualBox/lubuntu.vdi">ftp://mirror.mi.hdm-stuttgart.de/ubuntu/VirtualBox/lubuntu.vdi</uri>.
If you use a private Eclipse installation you may want to follow
<xref linkend="testngInstall"/>.</para>
<qandaset defaultlabel="qanda" xml:id="quandaentry_DbLayerUnitTest">
<title>Testing
<classname>sda.jdbc.intro.v1.PersistenceHandler</classname> using
<productname
xlink:href="http://testng.org">TestNG</productname></title>
<qandadiv>
<qandaentry>
<question>
<para>Read <xref linkend="chapUnitTesting"/>. Then
test:</para>
<itemizedlist>
<listitem>
<para>Proper behaviour when opening and closing
connections.</para>
</listitem>
<listitem>
<para>Proper behavior when inserting data</para>
</listitem>
<listitem>
<para>Expected behaviour when entering duplicate values
violating integrity constraints. Look for error messages
as well.</para>
</listitem>
</itemizedlist>
<para>You may write code to initialize the database state
appropriately prior to start tests.</para>
</question>
<answer>
<para><productname
xlink:href="http://testng.org">TestNG</productname> may be
directed by
<classname>sda.jdbc.intro.v1.prepare.PersistenceHandlerTest</classname>.</para>
</answer>
</qandaentry>
</qandadiv>
</qandaset>
</section>
</section>
<section xml:id="jdbcRead">
<title>Read Access</title>
<para>So far we've sent records to a database server. Applications
however need both directions: Pushing data to a Server and receiving
data as well. The overall process looks like:</para>
<figure xml:id="jdbcReadWrite">
<title>Server / client object's life cycle</title>
<mediaobject>
<imageobject>
<imagedata fileref="Ref/Fig/jdbcReadWrite.fig"/>
</imageobject>
</mediaobject>
</figure>
<para>So far we've only covered the second (<code>UPDATE</code>) part
of this picture. Reading objects from a database server into a
client's (transient) address space requires a container object to hold
the data in question. Though <link
linkend="gloss_Java"><trademark>Java</trademark></link> offers
standard container interfaces like
<classname>java.util.List</classname> the <trademark
xlink:href="http://electronics.zibb.com/trademark/jdbc/29545026">JDBC</trademark>
standard has created separate specifications like
<classname>java.sql.ResultSet</classname>. Instances of
<classname>java.sql.ResultSet</classname> will hold transient copies
of (database) objects. The next figure outlines the basic
approach:</para>
<figure xml:id="figJdbcRead">
<title>Reading data from a database server.</title>
<mediaobject>
<imageobject>
<imagedata fileref="Ref/Fig/jdbcread.fig" scale="65"/>
</imageobject>
</mediaobject>
</figure>
<para>We take an example. Suppose our database contains a table of our
friends' nicknames and their respective birth dates:</para>
<table border="1" xml:id="figRelationFriends">
<caption>Names and birth dates of friends.</caption>
<tr>
<td><programlisting language="none">CREATE TABLE Friends (
id INTEGER NOT NULL PRIMARY KEY
,nickname char(10)
,birthdate DATE
);</programlisting></td>
<td><programlisting language="none">INSERT INTO Friends VALUES
(1, 'Jim', '1991-10-10')
,(2, 'Eve', '2003-05-24')
,(3, 'Mick','2001-12-30')
;</programlisting></td>
</tr>
</table>
<para>Following the outline in <xref linkend="figJdbcRead"/> we may
access our data by:</para>
<figure xml:id="listingJdbcRead">
<title>Accessing relational data</title>
<programlisting language="none">package sda.jdbc.intro;
...
public class SimpleRead {
public static void main(String[] args) throws SQLException {
// Step 1: Open a connection to the database server
final Connection conn = DriverManager.getConnection (
DbProps.getString("PersistenceHandler.jdbcUrl"),
DbProps.getString("PersistenceHandler.username"),
DbProps.getString("PersistenceHandler.password"));
// Step 2: Create a Statement instance
final Statement stmt = conn.createStatement();
<emphasis role="bold">// Step 3: Creating the client side JDBC container holding our data records</emphasis>
<emphasis role="bold">final ResultSet data = stmt.executeQuery("SELECT * FROM Friends");</emphasis> <co
linkends="listingJdbcRead-1" xml:id="listingJdbcRead-1-co"/>
<emphasis role="bold">// Step 4: Dataset iteration
while (data.next()) {</emphasis> <co linkends="listingJdbcRead-2"
xml:id="listingJdbcRead-2-co"/>
<emphasis role="bold">System.out.println(data.getInt("id")</emphasis> <co
linkends="listingJdbcRead-3" xml:id="listingJdbcRead-3-co"/>
<emphasis role="bold">+ ", " + data.getString("nickname")</emphasis> <co
linkends="listingJdbcRead-3" xml:id="listingJdbcRead-4-co"/>
<emphasis role="bold">+ ", " + data.getString("birthdate"));</emphasis> <co
linkends="listingJdbcRead-3" xml:id="listingJdbcRead-5-co"/>
}
}
}</programlisting>
</figure>
<para>The marked code segment above shows difference with respect to
our data insertion application
<classname>sda.jdbc.intro.SimpleInsert</classname>. Some remarks are
in order:</para>
<calloutlist>
<callout arearefs="listingJdbcRead-1-co" xml:id="listingJdbcRead-1">
<para>As being mentioned in the introduction to this section the
<trademark
xlink:href="http://electronics.zibb.com/trademark/jdbc/29545026">JDBC</trademark>
standard comes with its own container interface rather than
<classname>java.util.List</classname> or similar.</para>
</callout>
<callout arearefs="listingJdbcRead-2-co" xml:id="listingJdbcRead-2">
<para>Calling <link
xlink:href="http://docs.oracle.com/javase/7/docs/api/java/sql/ResultSet.html#next()">next()</link>
prior to actually accessing data on the client side is mandatory!
The <link
xlink:href="http://docs.oracle.com/javase/7/docs/api/java/sql/ResultSet.html#next()">next()</link>
method places the internal iterator to the first element of our
dataset if not empty. Follow the link address and **read** the
documentation.</para>
</callout>
<callout arearefs="listingJdbcRead-3-co listingJdbcRead-4-co listingJdbcRead-5-co"
xml:id="listingJdbcRead-3">
<para>The access methods have to be chosen according to matching
types. An overview of database/<link
linkend="gloss_Java"><trademark>Java</trademark></link> type
mappings is being given in <uri
xlink:href="http://docs.oracle.com/javase/1.3/docs/guide/jdbc/getstart/mapping.html">http://docs.oracle.com/javase/1.3/docs/guide/jdbc/getstart/mapping.html</uri>.</para>
</callout>
</calloutlist>
<qandaset defaultlabel="qanda" xml:id="quandaentry_JdbcTypeConversion">
<title>Getter methods and type conversion</title>
<qandadiv>
<qandaentry>
<question>
<para>Apart from type mappings the <trademark
xlink:href="http://electronics.zibb.com/trademark/jdbc/29545026">JDBC</trademark>
access methods like <link
xlink:href="http://docs.oracle.com/javase/7/docs/api/java/sql/ResultSet.html#getString(int)">getString()</link>
may also be used for type conversion. Modify <xref
linkend="listingJdbcRead"/> by:</para>
<itemizedlist>
<listitem>
<para>Read the database attribute <code>id</code> by <link
xlink:href="http://docs.oracle.com/javase/7/docs/api/java/sql/ResultSet.html#getString(java.lang.String)">getString(String)</link>.</para>
</listitem>
<listitem>
<para>Read the database attribute nickname by <link
xlink:href="http://docs.oracle.com/javase/7/docs/api/java/sql/ResultSet.html#getInt(java.lang.String)">getInt(String)</link>.</para>
</listitem>
</itemizedlist>
<para>What do you observe?</para>
</question>
<answer>
<para>Modifying our iteration loop:</para>
<programlisting language="none">// Step 4: Dataset iteration
while (data.next()) {
System.out.println(data.<emphasis role="bold">getString</emphasis>("id") <co
linkends="jdbcReadWrongType-1"
xml:id="jdbcReadWrongType-1-co"/>
+ ", " + data.<emphasis role="bold">getInt</emphasis>("nickname") <co
linkends="jdbcReadWrongType-2"
xml:id="jdbcReadWrongType-2-co"/>
+ ", " + data.getString("birthdate"));
}</programlisting>
<para>We observe:</para>
<calloutlist>
<callout arearefs="jdbcReadWrongType-1-co"
xml:id="jdbcReadWrongType-1">
<para>Calling <link
xlink:href="http://docs.oracle.com/javase/7/docs/api/java/sql/ResultSet.html#getString(int)">getString()</link>
for a database attribute of type INTEGER does not cause
any trouble: The value gets silently converted to a string
value.</para>
</callout>
<callout arearefs="jdbcReadWrongType-2-co"
xml:id="jdbcReadWrongType-2">
<para>Calling <link
xlink:href="http://docs.oracle.com/javase/7/docs/api/java/sql/ResultSet.html#getInt(java.lang.String)">getInt(String)</link>
for the database field of type CHAR yields an (expected)
Exception:</para>
</callout>
</calloutlist>
<programlisting language="none">Exception in thread "main" java.sql.SQLException: Invalid value for getInt() - 'Jim'
at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:1073)
...</programlisting>
<para>We may however provide <quote>compatible</quote> data
records:</para>
<programlisting language="none">DELETE FROM Friends;
INSERT INTO Friends VALUES (1, <emphasis role="bold">'31'</emphasis>, '1991-10-10');</programlisting>
<para>This time our application executes perfectly
well:</para>
<programlisting language="none">1, 31, 1991-10-10</programlisting>
<para>Conclusion: The <trademark
xlink:href="http://electronics.zibb.com/trademark/jdbc/29545026">JDBC</trademark>
driver performs a conversion from a string type to an integer
similar like the <link
xlink:href="http://docs.oracle.com/javase/7/docs/api/java/lang/Integer.html#parseInt(java.lang.String)">parseInt(String)</link>
method.</para>
<para>The next series of exercises aims on a more powerful
implementation of our person data insertion application in
<xref linkend="exerciseInsertLoginCredentials"/>.</para>
</answer>
</qandaentry>
</qandadiv>
</qandaset>
<qandaset defaultlabel="qanda" xml:id="quandaentry_HandlingNull">
<title>Handling NULL values.</title>
<qandadiv>
<qandaentry>
<question>
<para>The attribute <code>birthday</code> in our database
table Friends allows <code>NULL</code> values:</para>
<programlisting language="none">INSERT INTO Friends VALUES
(1, 'Jim', '1991-10-10')
,(2, <emphasis role="bold"> NULL</emphasis>, '2003-5-24')
,(3, 'Mick', '2001-12-30');</programlisting>
<para>Starting our current application yields:</para>
<programlisting language="none">1, Jim, 1991-10-10
2, null, 2003-05-24
3, Mick, 2001-12-30</programlisting>
<para>This might be confuses with a person having the nickname
<quote>null</quote>. Instead we would like to have:</para>
<programlisting language="none">1, Jim, 1991-10-10
2, -Name unknown- , 2003-05-24
3, Mick, 2001-12-30</programlisting>
<para>Extend the current code of
<classname>sda.jdbc.intro.SimpleRead</classname> to produce
the above result in case of nickname <code>NULL</code>
values.</para>
<para>Hint: Read the documentation of <link
xlink:href="http://docs.oracle.com/javase/7/docs/api/java/sql/ResultSet.html#wasNull()">wasNull()</link>.</para>
</question>
<answer>
<para>A possible implementation is being given in
<classname>sda.jdbc.intro.v1.SimpleRead</classname>.</para>
</answer>
</qandaentry>
</qandadiv>
</qandaset>
<qandaset defaultlabel="qanda" xml:id="exerciseInsecureAuth">
<title>A user authentication <quote>strategy</quote></title>
<qandadiv>
<qandaentry>
<question>
<para>Our current application for entering <code>Person</code>
records lacks authentication: A user simply connects to the
database using credentials being hard coded in a properties
file. A programmer suggests to implement authentication based
on the following extension of the <code>Person</code>
table:</para>
<programlisting language="none">CREATE TABLE Person (
name char(80) NOT NULL
,email CHAR(20) NOT NULL UNIQUE
,login CHAR(10) UNIQUE -- login names must be unique --
,password CHAR(20)
);</programlisting>
<para>On clicking <quote>Connect</quote> a user may enter his
login name and password, <quote>fred</quote> and
<quote>12345678</quote> in the following example:</para>
<figure xml:id="figLogin">
<title>Login credentials for database connection</title>
<mediaobject>
<imageobject>
<imagedata fileref="Ref/Screen/login.screen.png"
scale="90"/>
</imageobject>
</mediaobject>
</figure>
<para>Based on these input values the following SQL query is
being executed by a <classname>java.sql.Statement</classname>
object:</para>
<programlisting language="none">SELECT * FROM Person WHERE login='<emphasis
role="bold">fred</emphasis>' and password = '<emphasis
role="bold">12345678</emphasis>'</programlisting>
<para>Since the login attribute is UNIQUE we are sure to
receive either 0 or 1 dataset. Our programmer proposes to
grant login if the query returns at least one dataset.</para>
<para>Discuss this implementation sketch with a colleague. Do
you think this is a sensible approach? <emphasis
role="bold">Write down</emphasis> your results.</para>
</question>
<answer>
<para>The approach is essentially unusable due to severe
security implications. Since it is based on
<classname>java.sql.Statement</classname> rater than on
<classname>java.sql.PreparedStatement</classname> objects it
is vulnerable to SQL injection attacks. A user my enter the
following password value in the GUI:</para>
<programlisting language="none">sd' OR '1' = '1</programlisting>
<para>Based on the login name <quote>fred</quote> the
following SQL string is being crafted:</para>
<programlisting language="none">SELECT * FROM Person WHERE login='fred' and password = 'sd' OR <emphasis
role="bold">'1' = '1'</emphasis>;</programlisting>
<para>Since the WHERE clause's last component always evaluates
to true, all objects from the <code>Person</code> relation are
returned thus permitting login.</para>
<para>The implementation approach suffers from a second
deficiency: The passwords are stored in clear text. If an
attacker gains access to the <code>Person</code> table he'll
immediately retrieve the passwords of all users. This problem
can be solved by storing hash values of passwords rather than
the clear text values themselves.</para>
</answer>
</qandaentry>
</qandadiv>
</qandaset>
<qandaset defaultlabel="qanda" xml:id="exerciseHashTraining">
<title>Passwords and hash values</title>
<qandadiv>
<qandaentry>
<question>
<para>In exercise <xref linkend="exerciseInsecureAuth"/> we
discarded the idea of clear text passwords in favour of
password hashes. In order to avoid Rainbow cracking so called
salted hashes are superior. You should read <uri
xlink:href="https://www.heckrothindustries.co.uk/articles/an-introduction-to-password-hashes">https://www.heckrothindustries.co.uk/articles/an-introduction-to-password-hashes</uri>
for overview purposes. The article contains further references
on the bottom of the page.</para>
<para>With respect to an implementation <uri
xlink:href="http://stackoverflow.com/questions/2860943/suggestions-for-library-to-hash-passwords-in-java#11038230">http://stackoverflow.com/questions/2860943/suggestions-for-library-to-hash-passwords-in-java</uri>
provides a simple example for:</para>
<itemizedlist>
<listitem>
<para>Creating a salted hash from a given password
string.</para>
</listitem>
<listitem>
<para>Verify if a hash string matches a given clear text
password.</para>
</listitem>
</itemizedlist>
<para>The example uses an external library. On <productname
xlink:href="http://www.ubuntu.com">Ubuntu</productname> Linux
this may be installed by issuing <command>aptitude</command>
<option>install</option>
<option>libcommons-codec-java</option>. On successful install
the file
<filename>/usr/share/java/commons-codec-1.5.jar</filename> may
be appended to your <envar>CLASSPATH</envar>.</para>
<para>You may as well use <uri
xlink:href="http://crackstation.net/hashing-security.htm#javasourcecode">http://crackstation.net/hashing-security.htm#javasourcecode</uri>
as a starting point. This example works standalone without
needing an external library. Note: Tis example produces
different (incompatible) hash values.</para>
<para>Create a simple main() method to experiment with the two
class methods.</para>
</question>
<answer>
<para>Starting from <uri
xlink:href="http://stackoverflow.com/questions/2860943/suggestions-for-library-to-hash-passwords-in-java#11038230">http://stackoverflow.com/questions/2860943/suggestions-for-library-to-hash-passwords-in-java</uri>
we create a slightly modified class
<classname>sda.jdbc.intro.auth.HashProvider</classname>
offering both hash providing <coref
linkend="hashProviderMethod"/> and verifying <coref
linkend="hashVerifyMethod"/> methods:</para>
<programlisting language="none">package sda.jdbc.intro.auth;
...
public class HashProvider {
...
/** Computes a salted PBKDF2 hash of given plaintext password
suitable for storing in a database. */
public static <emphasis role="bold">String getSaltedHash</emphasis> <co
xml:id="hashProviderMethod"/>(char [] password) {
byte[] salt;
try {
salt = SecureRandom.getInstance("SHA1PRNG").generateSeed(saltLen);
// store the salt with the password
return Base64.encodeBase64String(salt) + "$" + hash(password, salt);
} catch (NoSuchAlgorithmException e) {
e.printStackTrace();
}
System.exit(1);
return null;
}
/** Checks whether given plaintext password corresponds
to a stored salted hash of the password. */
public static <emphasis role="bold">boolean check</emphasis> <co
xml:id="hashVerifyMethod"/>(char[] password, String stored){
String[] saltAndPass = stored.split("\\$");
if (saltAndPass.length != 2)
return false;
String hashOfInput = hash(password, Base64.decodeBase64(saltAndPass[0]));
return hashOfInput.equals(saltAndPass[1]);
}
...}</programlisting>
<para>We may test the two class methods
<methodname>sda.jdbc.intro.auth.HashProvider.getSaltedHash(char[])</methodname>(...)
and
<methodname>sda.jdbc.intro.auth.HashProvider.check(char[],String)</methodname>
by a separate driver class. Notice the <quote>$</quote> sign
<coref linkend="saltPwhashSeparator"/> separating salt and
password hash:</para>
<programlisting language="none">package sda.jdbc.intro.auth;
public class TestHashProvider {
public static void main(String [] args) throws Exception {
final char [] clearText = {'s', 'e', 'c'};
final String hash = <emphasis role="bold">HashProvider.getSaltedHash(clearText)</emphasis>;
System.out.println("Hash:" + hash);
if (HashProvider.check(clearText, <co
xml:id="saltPwhashSeparator"/>
"<emphasis role="bold">HwX2DkuYiwp7xogm3AGndza8DKRVvCMntxRvCrCGFPw=</emphasis>$<emphasis
role="bold">6Ix11yHNB4uPZuF2IQYxVV/MYragJwTDE33OIFR9a24=</emphasis>")) {
System.out.println("hash matches");
} else {
System.out.println("hash does not match"); ...</programlisting>
</answer>
</qandaentry>
</qandadiv>
</qandaset>
<qandaset defaultlabel="qanda" xml:id="exerciseInsertLoginCredentials">
<title>Gui authentication: The real McCoy</title>
<qandadiv>
<qandaentry>
<question>
<para>We now implement a refined version to enter
<code>Person</code> records based on the solutions of two
related exercises:</para>
<glosslist>
<glossentry>
<glossterm><xref
linkend="exercisefilterUserInput"/></glossterm>
<glossdef>
<para>Avoiding SQL injection by sanitizing user
input</para>
</glossdef>
</glossentry>
<glossentry>
<glossterm><xref
linkend="exerciseSqlInjectPrepare"/></glossterm>
<glossdef>
<para>Avoiding SQL injection by using
<classname>java.sql.PreparedStatement</classname>
objects.</para>
</glossdef>
</glossentry>
</glosslist>
<para>A better solution should combine both techniques.
Non-vulnerability a basic requirement. Checking an E-Mail for
minimal conformance is an added value.</para>
<para>In order to address authentication the relation Person
has to be extended appropriately. The GUI needs two additional
fields for login name and password as well. The following
video demonstrates the intended behaviour:</para>
<figure xml:id="videoConnectAuth">
<title>Intended usage behaviour for insertion of data
records.</title>
<mediaobject>
<videoobject>
<videodata fileref="Ref/Video/connectauth.mp4"/>
</videoobject>
</mediaobject>
</figure>
<para>Don't forget to use password hashes like those from
<xref linkend="exerciseHashTraining"/>. Due to their length
you may want to consider the data type
<code>TEXT</code>.</para>
</question>
<answer>
<para>In comparison to earlier versions it does make sense to
add some internal container structures. First we note, that
each GUI input field requires:</para>
<itemizedlist>
<listitem>
<para>A label like <quote>Enter password</quote>.</para>
</listitem>
<listitem>
<para>A corresponding field object to hold user entered
input.</para>
</listitem>
<listitem>
<para>A validator checking for correctness of entered
data.</para>
</listitem>
<listitem>
<para>A label or text field for warning messages in case
of invalid user input.</para>
</listitem>
</itemizedlist>
<para>First we start by grouping label <coref
linkend="uiuLabel"/>, input field's verifier <coref
linkend="uiuVerifier"/> and the error message label <coref
linkend="uiuErrmsg"/> in
<classname>sda.jdbc.intro.auth.UserInputUnit</classname>:</para>
<programlisting language="none">package sda.jdbc.intro.auth;
...
public class UserInputUnit {
final JLabel label; <co xml:id="uiuLabel"/>
final InputVerifierNotify verifier; <co xml:id="uiuVerifier"/>
final JLabel errorMessage; <co xml:id="uiuErrmsg"/>
public UserInputUnit(final String guiText, final InputVerifierNotify verifier) {
this.label = new JLabel(guiText);
this.verifier = verifier;
errorMessage = new JLabel();
} ...</programlisting>
<para>The actual GUI text field is being defined <coref
linkend="verfierGuiField"/> in class
<classname>sda.jdbc.intro.auth.InputVerifierNotify</classname>:</para>
<programlisting language="none">package sda.jdbc.intro.auth;
...
public abstract class InputVerifierNotify extends InputVerifier {
protected final String errorMessage;
public final JLabel validationLabel;
public final JTextField field; <co xml:id="verfierGuiField"/>
public InputVerifierNotify(final JTextField field, final String errorMessage) { ...</programlisting>
<para>We need two field verifier classes being derived from
<classname>sda.jdbc.intro.auth.InputVerifierNotify</classname>:</para>
<glosslist>
<glossentry>
<glossterm><classname>sda.jdbc.intro.auth.RegexpVerifier</classname></glossterm>
<glossdef>
<para>This one is well known from earlier versions and
is used to validate text input fields by regular
expressions.</para>
</glossdef>
</glossentry>
<glossentry>
<glossterm><classname>sda.jdbc.intro.auth.InputVerifierNotify</classname></glossterm>
<glossdef>
<para>This verifier class is responsible for comparing
our two password fields to have identical values.</para>
</glossdef>
</glossentry>
</glosslist>
<para>All these components get assembled in
<classname>sda.jdbc.intro.auth.InsertPerson</classname>. We
remark some important points:</para>
<programlisting language="none">package sda.jdbc.intro.auth;
...
public class InsertPerson extends JFrame {
... // GUI attributes for user input
final UserInputUnit name = <co linkends="listingInsertUserAuth-1"
xml:id="listingInsertUserAuth-1-co"/>
new UserInputUnit(
"Name",
new RegexpVerifier(new JTextField(15), "^[^;'\"]+$", "No special characters allowed"));
// We need a reference to the password field to avoid
// casting from JTextField later.
private final JPasswordField passwordField = new JPasswordField(10); <co
linkends="listingInsertUserAuth-2"
xml:id="listingInsertUserAuth-2-co"/>
private final UserInputUnit password =
new UserInputUnit(
"Password",
new RegexpVerifier(passwordField, "^.{6,20}$", "length from 6 to 20 characters"));
...
private final UserInputUnit passwordRepeat =
new UserInputUnit(
"repeat pass.",
new EqualValueVerifier <co linkends="listingInsertUserAuth-3"
xml:id="listingInsertUserAuth-3-co"/> (new JPasswordField(10), passwordField, "Passwords do not match"));
private final UserInputUnit [] userInputUnits = <co
linkends="listingInsertUserAuth-4"
xml:id="listingInsertUserAuth-4-co"/>
{name, email, login, password, passwordRepeat};
...
private void userLoginDialog() {...}
...
public InsertPerson (){
...
databaseFieldPanel.setLayout(new GridLayout(0, 3)); //Third column for validation label
add(databaseFieldPanel);
for (UserInputUnit unit: userInputUnits) { <co
linkends="listingInsertUserAuth-5"
xml:id="listingInsertUserAuth-5-co"/>
databaseFieldPanel.add(unit.label);
databaseFieldPanel.add(unit.verifier.field);
databaseFieldPanel.add(unit.verifier.validationLabel);
}
insertButton.addActionListener(new ActionListener() {
@Override public void actionPerformed(ActionEvent e) {
if (inputValuesAllValid()) {
if (persistenceHandler.add( <co
linkends="listingInsertUserAuth-6"
xml:id="listingInsertUserAuth-6-co"/>
name.getText(),
email.getText(),
login.getText(),
passwordField.getPassword())) {
clearMask();
...}
private void clearMask() { <co linkends="listingInsertUserAuth-7"
xml:id="listingInsertUserAuth-7-co"/>
for (UserInputUnit unit: userInputUnits) {
unit.verifier.field.setText("");
unit.verifier.clear();
}
}
private boolean inputValuesAllValid() {<co
linkends="listingInsertUserAuth-8"
xml:id="listingInsertUserAuth-8-co"/>
for (UserInputUnit unit: userInputUnits) {
if (!unit.verifier.verify(unit.verifier.field)){
return false;
}
}
return true;
}
}</programlisting>
<calloutlist>
<callout arearefs="listingInsertUserAuth-1-co"
xml:id="listingInsertUserAuth-1">
<para>All GUI related stuff for entering a user's
name</para>
</callout>
<callout arearefs="listingInsertUserAuth-2-co"
xml:id="listingInsertUserAuth-2">
<para>Password fields need special treatment:
<code>getText()</code> is superseded by
<code>getPassword()</code>. In order to avoid casts from
<classname>javax.swing.JTextField</classname> to
<classname>javax.swing.JPasswordField</classname> we
simply keep an extra reference.</para>
</callout>
<callout arearefs="listingInsertUserAuth-3-co"
xml:id="listingInsertUserAuth-3">
<para>In order to check both password fields for identical
values we need a different validator
<classname>sda.jdbc.intro.auth.EqualValueVerifier</classname>
expecting both password fields in its constructor.</para>
</callout>
<callout arearefs="listingInsertUserAuth-4-co"
xml:id="listingInsertUserAuth-4">
<para>All 5 user input elements get grouped by an array.
This allows for iterations like in <coref
linkend="listingInsertUserAuth-7-co"/> or <coref
linkend="listingInsertUserAuth-8-co"/>.</para>
</callout>
<callout arearefs="listingInsertUserAuth-5-co"
xml:id="listingInsertUserAuth-5">
<para>Adding all GUI elements to the base pane in a
loop.</para>
</callout>
<callout arearefs="listingInsertUserAuth-6-co"
xml:id="listingInsertUserAuth-6">
<para>Providing user entered values to the persistence
provider.</para>
</callout>
<callout arearefs="listingInsertUserAuth-7-co"
xml:id="listingInsertUserAuth-7">
<para>Whenever a dataset has been successfully sent to the
database we have to clean our GUI to possibly enter
another record.</para>
</callout>
<callout arearefs="listingInsertUserAuth-8-co"
xml:id="listingInsertUserAuth-8">
<para>Thanks to our grouping aggregation of individual
input GUI field validation states becomes easy.</para>
</callout>
</calloutlist>
</answer>
</qandaentry>
</qandadiv>
</qandaset>
<qandaset defaultlabel="qanda" xml:id="quandaentry_ArchSecurity">
<title>Architectural security considerations</title>
<qandadiv>
<qandaentry>
<question>
<para>In <xref linkend="exerciseInsertLoginCredentials"/> we
achieved end user credential protection. How about the overall
application security? Provide improvement proposals if
appropriate. Hint: Consider the way credentials are being
supplied.</para>
</question>
<answer>
<para>Connecting the client to our database server solely
depends on credentials <coref
linkend="databaseUserHdmPassword"/> being stored in a
properties file
<filename>database.properties</filename>:</para>
<programlisting language="none">PersistenceHandler.jdbcUrl=jdbc:mysql://localhost:3306/hdm
PersistenceHandler.username=hdmuser <co xml:id="databaseUserHdmUsername"/>
PersistenceHandler.password=<emphasis role="bold">XYZ</emphasis> <co
xml:id="databaseUserHdmPassword"/></programlisting>
<para>This properties file is user accessible and contains the
password in clear text. Arbitrary applications connecting to
the database server using this account do have all permissions
being granted to <code>hdmuser</code> <coref
linkend="databaseUserHdmUsername"/>. In order for our
application to work correctly the set of granted permissions
contains at least inserting datasets. Thus new users e.g.
<code>smith</code> including credentials may be inserted.
Afterwards the original application can be started by logging
in as <code>smith</code>.</para>
<para>Conclusion: The current application architecture is
seriously flawed with respect to security.</para>
<para>Rather then using a common database account
<code>hdmuser</code> we may configure per-user accounts on the
database server having individual user credentials. This way
user credentials are no longer stored in our
<code>Person</code> table but are being managed by the
database server's user management and privilege facilities.
This completely avoids storing credentials on the client
side.</para>
</answer>
</qandaentry>
</qandadiv>
</qandaset>
<section xml:id="sda1SaxRdbms">
<title>SAX and RDBMS</title>
<qandaset defaultlabel="qanda" xml:id="exercise_saxAttrib">
<title>Reading XML attributes</title>
<qandadiv>
<qandaentry xml:id="saxRdbms">
<question>
<label>SAX processing with RDBMS access.</label>
<para>Implement the example given in <xref
linkend="saxRdbmsAccessPrinciple"/> to produce the output
sketched in <xref linkend="saxPriceOut"/>. You may start by
implementing <emphasis>and testing</emphasis> the following
methods of a RDBMS interfacing class using <trademark
xlink:href="http://electronics.zibb.com/trademark/jdbc/29545026">JDBC</trademark>:</para>
<programlisting language="none">package sax.rdbms;
public class RdbmsAccess {
public void connect(final String host, final int port,
final String userName, final String password) {
// <emphasis role="bold">open connection to a database</emphasis>
}
public String readPrice(final String articleNumber) {
return "0"; // <emphasis role="bold">To be implemented as access to a ResultSet object</emphasis>
}
public void close() {
// <emphasis role="bold">close database connection</emphasis>
}
}</programlisting>
<para>You may find it helpful to write a small testbed for
the RDBMS access functionality prior to integrate it into
your <acronym
xlink:href="http://www.saxproject.org">SAX</acronym>
application producing HTML output.</para>
</question>
<answer>
<para>We start by creating a suitable RDBMS Table:</para>
<programlisting language="none">CREATE SCHEMA AUTHORIZATION midb2
CREATE TABLE Product(
orderNo CHAR(10) NOT NULL PRIMARY KEY
,price DECIMAL (9,2) NOT NULL
)</programlisting>
<para>Next we feed some toy data:</para>
<programlisting language="none">INSERT INTO Product VALUES('x-223', 330.20);
INSERT INTO Product VALUES('w-124', 110.40);</programlisting>
<para>Now we implement our RDBMS access class:</para>
<programlisting language="none">package dom.xsl;
...
public class DbAccess {
public void connect(final String jdbcUrl,
final String userName, final String password) {
try {
conn = DriverManager.getConnection(jdbcUrl, userName, password);
priceQuery = conn.prepareStatement(sqlPriceQuery);
} catch (SQLException e) {
System.err.println("Unable to open connection to database:" + e);}
}
public String readPrice(final String articleNumber) {
String result;
try {
priceQuery.setString(1, articleNumber);
final ResultSet rs = priceQuery.executeQuery();
if (rs.next()) {
result = rs.getString("price");
} else {
result = "No price available for article '" + articleNumber + "'";
}
} catch (SQLException e) {
result = "Error reading price for article '" + articleNumber + "':" + e;
}
return result;
}
public void close() {
try {conn.close();} catch (SQLException e) {
System.err.println("Error closing database connection:" + e);
}
}
static {
try { Class.forName("com.ibm.db2.jcc.DB2Driver");
} catch (ClassNotFoundException e) {
System.err.println("Unable to register Driver:" + e);}
}
private static final String sqlPriceQuery =
"SELECT price FROM Product WHERE orderNo = ?";
private PreparedStatement priceQuery = null;
private Connection conn = null;
}</programlisting>
<para>This access layer may be tested independently from
handling catalog instances:</para>
<programlisting language="none">package dom/xsl;
public class DbAccessDriver {
public static void main(String[] args) {
final DbAccess dbaccess = new DbAccess();
dbaccess.connect("jdbc:db2://db2.mi.hdm-stuttgart.de:10000/hdm",
"midb2", "password");
System.out.println(dbaccess.readPrice("x-223"));
System.out.println(dbaccess.readPrice("..aaargh!"));
dbaccess.close();
}
}</programlisting>
<para>If the above test succeeds we may embed the RDBMS
access layer into our The <acronym
xlink:href="http://www.saxproject.org">SAX</acronym>
handler:</para>
<programlisting language="none">package sax.rdbms;
...
public class HtmlEventHandler extends DefaultHandler{
public void startDocument() {
dbaccess.connect("jdbc:db2://db2.mi.hdm-stuttgart.de:10000/hdm",
"midb2", "password");
System.out.println("<html><head><title>Catalog</title></head>");
}
public void endDocument() {
System.out.println("</html>");
dbaccess.close();
}
public void startElement(String namespaceUri, String localName,
String rawName, Attributes attrs){
if (rawName.equals("catalog")){
System.out.println("<body><H1>A catalog</H1>"
+"<table border='1'><tbody>");
System.out.println("<tr><th>Order number</th>\n"
+ "<th>Price</th>\n"
+" <th>Product</th></tr>");
} else if (rawName.equals("item")){
final String orderNo = attrs.getValue("orderNo");
System.out.print("<tr><td>" + orderNo
+ "</td>\n<td>" + dbaccess.readPrice(orderNo)
+ "</td>\n<td>");
} else {
System.err.println("Element '" + rawName + "' unknown");
}
}
public void endElement(String namespaceUri, String localName,
String rawName) {
if (rawName.equals("catalog")){
System.out.println("</tbody></table>");
} else if (rawName.equals("item")){
System.out.println("</td></tr>\n");
}
}
public void characters(char[] ch, int start, int length) {
System.out.print(new String(ch, start, length));
}
private DbAccess dbaccess = new DbAccess();
}</programlisting>
</answer>
</qandaentry>
</qandadiv>
</qandaset>
</section>
</section>
</section>
</chapter>
<chapter xml:id="chapUnitTesting">
<title>Unit testing with <productname
xlink:href="http://testng.org">TestNG</productname></title>
<para>This chapter presents a very short introduction to the basic usage
of unit testing. We start with a simple stack implementation:</para>
<programlisting language="none">package sda.unittesting;
public class MyStack {
int [] data = new int[5];
int numElements = 0;
public void push(final int n) {
data[numElements] = n;
numElements++;
}
public int pop() {
numElements--;
return data[numElements];
}
public int top() {
return data[numElements - 1];
}
public boolean empty() {
return 0 == numElements;
}
}</programlisting>
<para>Readers being familiar with stacks will immediately notice a
deficiency in the above code: This stack is actually bounded. It only
allows us to store a maximum number of five integer values.</para>
<para>The following implementation allows us to functionally test our
<classname>sda.unittesting.MyStack</classname> implementation with respect
to the usual stack behaviour:</para>
<programlisting language="none" linenumbering="numbered">package sda.unittesting;
public class MyStackFuncTest {
private static void assertTrue(boolean status) {
if (!status) {
throw new RuntimeException("Assert failed");
}
}
public static void main(String[] args) {
final MyStack stack = new MyStack();
// Test 1: A new MyStack instance should not contain any elements.
assertTrue(stack.empty());
// Test 2: Adding and removal
stack.push(4);
assertTrue (!stack.empty());
assertTrue (4 == stack.top());
assertTrue (4 == stack.pop());
assertTrue (stack.empty());
// Test 3: Trying to add more than five values
stack.push(1);stack.push(2);stack.push(3);stack.push(4);
stack.push(5);
stack.push(6);
assertTrue(6 == stack.pop());
}
}</programlisting>
<para>Execution yields a runtime exception which is due to the attempted
insert operation <code>stack.push(6)</code>:</para>
<programlisting language="none">Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: 5
at sda.unittesting.MyStack.push(MyStack.java:8)
at sda.unittesting.MyStackFuncTest.main(MyStackFuncTest.java:20)</programlisting>
<para>The execution result is easy to understand since our
<classname>sda.unittesting.MyStack </classname> implementation only allows
to store 5 values.</para>
<para>Our testing application is fine so far. It does however lack some
features:</para>
<itemizedlist>
<listitem>
<para>automatic initialization before starting tests and finalization
at the end.</para>
</listitem>
<listitem>
<para>Our test is monolithic: We used comments to document different
tests. This knowledge is implicit and thus invisible to testing
frameworks. Test results (failure/success) cannot be assigned to test
1, test 2 for example.</para>
</listitem>
<listitem>
<para>Aggregation and visualization of test results</para>
</listitem>
<listitem>
<para>Dependencies between individual tests</para>
</listitem>
<listitem>
<para>Ability to enable and disable tests according to a project's
maturity level. In our example test 3 might be disabled till an
unbounded implementation gets completed.</para>
</listitem>
</itemizedlist>
<para>Testing frameworks like <productname
xlink:href="http://junit.org">Junit</productname> or <productname
xlink:href="http://testng.org">TestNG</productname> provide means for
efficient and flexible test organization. Using <productname
xlink:href="http://testng.org">TestNG</productname> our current test
application including only test 1 and test 2 reads:</para>
<programlisting language="none">package sda.unittesting;
import org.testng.annotations.Test;
public class MyStackTestSimple {
final MyStack stack = new MyStack();
@Test
public void empty() {
assert(stack.empty());
}
@Test
public void pushPopEmpty() {
assert (stack.empty());
stack.push(4);
assert (!stack.empty());
assert (4 == stack.top());
assert (4 == stack.pop());
assert (stack.empty());
}
}</programlisting>
<para>We notice the absence of a <function>main()</function> method. Our
testing framework uses the above code for test definitions. In contrast to
our homebrew solution the individual tests are now defined in a machine
readable fashion. This allows for sophisticated statistics. Executing
inside <productname xlink:href="http://testng.org">TestNG</productname>
produces the following results:</para>
<programlisting language="none">PASSED: empty
PASSED: pushPopEmpty
===============================================
Default test
Tests run: 2, Failures: 0, Skips: 0
===============================================
===============================================
Default suite
Total tests run: 2, Failures: 0, Skips: 0
===============================================</programlisting>
<para>Both tests run successfully. So why did we omit test 3 which is
bound to fail? We now add it to the test suite:</para>
<programlisting language="none">package sda.unittesting;
...
public class MyStackTestSimple1 {
...
@Test
public void empty() {
assert(stack.empty());
...
@Test
public void push6() {
stack.push(1);
stack.push(2);
stack.push(3);
stack.push(4);
stack.push(5);
stack.push(6);
assert (6 == stack.pop());
} ...</programlisting>
<para>As expected test 3 fails. But the result shows test 2 failing as
well:</para>
<programlisting language="none">PASSED: empty
FAILED: push6
java.lang.ArrayIndexOutOfBoundsException: 5
at sda.unittesting.MyStack.push(MyStack.java:8)
at sda.unittesting.MyStackTestSimple1.push6(MyStackTestSimple1.java:30)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
...
FAILED: pushPopEmpty
java.lang.AssertionError
at sda.unittesting.MyStackTestSimple1.pushPopEmpty(MyStackTestSimple1.java:15)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
...
===============================================
Default test
Tests run: 3, Failures: 2, Skips: 0
===============================================</programlisting>
<para>This unexpected result is due to the execution order of the three
individual tests. Within our class
<classname>sda.unittesting.MyStackTestSimple1</classname> the three tests
appear in the sequence test 1, test 2 and test 3. This however is just the
order of source code. The testing framework will not infer any order and
thus execute our three tests in <emphasis role="bold">arbitrary</emphasis>
order. The execution log shows the actual order:</para>
<orderedlist>
<listitem>
<para>Test <quote><code>empty</code></quote></para>
</listitem>
<listitem>
<para>Test <quote><code>push6</code></quote></para>
</listitem>
<listitem>
<para>Test <quote><code>pushPopEmpty</code></quote></para>
</listitem>
</orderedlist>
<para>So the second test will raise an exception and leave the stack
filled with the maximum possible five elements. Thus it is not empty and
the <quote><code>pushPopEmpty</code></quote> test fails as well.</para>
<para>If we want to avoid this type of errors we may:</para>
<itemizedlist>
<listitem>
<para>Declare tests within separate (test class) definitions</para>
</listitem>
<listitem>
<para>Define dependencies like test X can only be executed after test
Y.</para>
</listitem>
</itemizedlist>
<para>The <productname xlink:href="http://testng.org">TestNG</productname>
framework offers a feature which allows the definition of test groups and
dependencies between them. We use this feature to refine our test
definition:</para>
<programlisting language="none">package sda.unittesting;
...
public class MyStackTest {
...
@Test (<emphasis role="bold">groups = "basic"</emphasis>)
public void empty() {
assert(stack.empty());
}
@Test (<emphasis role="bold">groups = "basic"</emphasis>)
public void pushPopEmpty() {
...
}
@Test (<emphasis role="bold">dependsOnGroups = "basic"</emphasis>)
public void push6() {
...
}</programlisting>
<para>The first two tests will now belong to the same test group
<quote>basic</quote>. The <emphasis role="bold"><code>dependsOnGroups =
"basic"</code></emphasis> declaration will guarantee that our
<code>push6</code> test will be launched as the last one. So we get the
expected result:</para>
<programlisting language="none">PASSED: empty
PASSED: pushPopEmpty
FAILED: push6
java.lang.ArrayIndexOutOfBoundsException: 5
at sda.unittesting.MyStack.push(MyStack.java:8)
at sda.unittesting.MyStackTest.push6(MyStackTest.java:30)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
...
===============================================
Default test
Tests run: 3, Failures: 1, Skips: 0
===============================================</programlisting>
<para>In fact the order between the first two tests might be critical as
well. The <quote><code>pushPopEmpty</code></quote> test leaves our stack
in an empty state. If this is not the case reversing the execution order
of <quote><code>pushPopEmpty</code></quote> and
<quote><code>empty</code></quote> would cause an error as well.</para>
<para>Programming <abbrev
xlink:href="http://en.wikipedia.org/wiki/Integrated_development_environment">IDE</abbrev>s
like eclipse provide elements for test result visualization. Our last test
gets summarized as:</para>
<screenshot>
<info>
<title><productname
xlink:href="http://testng.org">TestNG</productname> result
presentation in eclipse</title>
</info>
<mediaobject>
<imageobject>
<imagedata fileref="Ref/Screen/eclipseTestngResult.screen.png"
scale="75"/>
</imageobject>
</mediaobject>
</screenshot>
<para>We can drill down from a result of type failure to its occurrence
within the corresponding code.</para>
</chapter>
<chapter xml:id="fo">
<title>Generating printed output</title>
<titleabbrev>Print</titleabbrev>
<section xml:id="foIntro">
<title>Online and print versions</title>
<titleabbrev>online / print</titleabbrev>
<para>We already learned how to transform XML documents into HTML by
means of a <abbrev xlink:href="http://www.w3.org/Style/XSL">XSL</abbrev>
style sheet processor. In principle we may create printed output by
using a HTML Browser's print function. However the result will not meet
reasonable typographical standards. A list of commonly required features
for printed output includes:</para>
<variablelist>
<varlistentry>
<term>Line breaks</term>
<listitem>
<para>Text paragraphs have to be divided into lines. To achieve
best results the processor must implement the hyphenation rules of
the language in question in order to automatically hyphenate long
words. This is especially important for text columns of limited
width as appearing in newspapers.</para>
</listitem>
</varlistentry>
<varlistentry>
<term>Page breaks</term>
<listitem>
<para>Since printed pages are limited in height the content has to
be broken into pages. This may be difficult to achieve:</para>
<itemizedlist>
<listitem>
<para>Large images being indivisible may have to be deferred
to the following page leaving large amounts of empty
space.</para>
</listitem>
<listitem>
<para>Long tables may have to be subdivided into smaller
blocks. Thus it may be required to define sets of additional
footers like <quote>to be continued on the next page</quote>
and additional table headers containing column descriptions on
subsequent pages.</para>
</listitem>
</itemizedlist>
</listitem>
</varlistentry>
<varlistentry>
<term>Page references</term>
<listitem>
<para>Document internal references via <link
xlink:href="http://www.w3.org/TR/xml#id">ID</link> / <link
xlink:href="http://www.w3.org/TR/xml#idref">IDREF</link> pairs may
be represented as page references like <quote>see page
32</quote>.</para>
</listitem>
</varlistentry>
<varlistentry>
<term>Left and right pages</term>
<listitem>
<para>Books usually have a different layout for
<quote>left</quote> and <quote>right</quote> pages. Page numbers
usually appear on the left side of a <quote>left</quote> page and
vice versa.</para>
<para>Very often the head of each page contains additional
information e.g. a chapter's name on each <quote>left</quote> page
head and the actual section's name on each <quote>right</quote>
page's head.</para>
<para>In addition chapters usually start on a <quote>right</quote>
page. Sometimes a chapter's starting page has special layout
features e.g. a missing description in the page's head which will
only be given on subsequent pages.</para>
</listitem>
</varlistentry>
<varlistentry>
<term>Footnotes</term>
<listitem>
<para>Footnotes have to be numbered on a per page basis and have
to appear on the current page.</para>
</listitem>
</varlistentry>
</variablelist>
</section>
<section xml:id="foStart">
<title>A simple <abbrev
xlink:href="http://www.w3.org/TR/2001/REC-xsl-20011015/slice6.html#fo-section">FO</abbrev>
document</title>
<titleabbrev>Simple <abbrev
xlink:href="http://www.w3.org/TR/2001/REC-xsl-20011015/slice6.html#fo-section">FO</abbrev></titleabbrev>
<para>A renderer for printed output from XML content also needs
instructions how to format the different elements. A common way to
define these formatting properties is by using <emphasis>Formatting
Objects</emphasis> (<abbrev
xlink:href="http://www.w3.org/TR/2001/REC-xsl-20011015/slice6.html#fo-section">FO</abbrev>)
standard. <abbrev
xlink:href="http://www.w3.org/TR/2001/REC-xsl-20011015/slice6.html#fo-section">FO</abbrev>
documents may be compared to HTML. A HTML document has to be rendered by
a piece of software called a browser in order to be viewed as an image.
Likewise <abbrev
xlink:href="http://www.w3.org/TR/2001/REC-xsl-20011015/slice6.html#fo-section">FO</abbrev>
documents have to be rendered by a piece of software called a formatting
objects processor which typically yields PostScript or PDF output. As a
starting point we take a simple example:</para>
<figure xml:id="foHelloWorld">
<title>The most simple <abbrev
xlink:href="http://www.w3.org/TR/2001/REC-xsl-20011015/slice6.html#fo-section">FO</abbrev>
document</title>
<programlisting language="none"><?xml version="1.0" encoding="utf-8"?>
<fo:root xmlns:fo="http://www.w3.org/1999/XSL/Format">
<fo:layout-master-set>
<!-- Define a simple page layout -->
<fo:simple-page-master master-name="simplePageLayout"
page-width="60mm" page-height="100mm">
<fo:region-body/>
</fo:simple-page-master>
</fo:layout-master-set>
<!-- Print a set of pages using the previously defined layout -->
<fo:page-sequence master-reference="simplePageLayout">
<fo:flow flow-name="xsl-region-body">
<emphasis role="bold"><fo:block>Hello, World ...</fo:block></emphasis>
</fo:flow>
</fo:page-sequence>
</fo:root></programlisting>
</figure>
<para>PDF generation is initiated by executing a <abbrev
xlink:href="http://www.w3.org/TR/2001/REC-xsl-20011015/slice6.html#fo-section">FO</abbrev>
processor. At the MI department the script <code>fo2pdf</code> invokes
<orgname>RenderX</orgname>'s <productname
xlink:href="http://www.renderx.com">xep</productname> processor:</para>
<programlisting language="none">fo2pdf -fo hello.fo -pdf hello.pdf</programlisting>
<para>This creates a PDF file which may be printed or previewed by e.g.
<productname xlink:href="http://www.adobe.com">Adobe</productname>'s
acrobat reader or evince under Linux. For a list of command line options
see <productname
xlink:href="http://www.renderx.com/reference.html">xep's
documentation</productname>.</para>
</section>
<section xml:id="layoutParam">
<title>Page layout</title>
<para>The result from of our <quote>Hello, World ...</quote> code is not
very impressive. In order to develop more elaborated examples we have to
understand the underlying layout model being defined in a <link
xlink:href="http://www.w3.org/TR/xsl/#fo_simple-page-master">fo:simple-page-master</link>
element. First of all <abbrev
xlink:href="http://www.w3.org/TR/2001/REC-xsl-20011015/slice6.html#fo-section">FO</abbrev>
allows to subdivide a physical page into different regions:</para>
<figure xml:id="foRegionList">
<title>Regions being defined in a page.</title>
<mediaobject>
<imageobject>
<imagedata fileref="Ref/Fig/regions.fig"/>
</imageobject>
</mediaobject>
</figure>
<para>The most important area in this model is denoted by <link
xlink:href="http://www.w3.org/TR/xsl/#fo_region-body">fo:region-body</link>.
Other regions like <link
xlink:href="http://www.w3.org/TR/xsl/#fo_region-before">fo:region-before</link>
are typically used as containers for meta information such as chapter
headings and page numbering. We take a closer look to the <link
xlink:href="http://www.w3.org/TR/xsl/#fo_region-body">fo:region-body</link>
area and supply an example of parameterization:</para>
<figure xml:id="foParamRegBody">
<title>A complete <abbrev
xlink:href="http://www.w3.org/TR/2001/REC-xsl-20011015/slice6.html#fo-section">FO</abbrev>
parameterizing of a physical page and the <link
xlink:href="http://www.w3.org/TR/xsl/#fo_region-body">fo:region-body</link>.</title>
<programlisting language="none"><?xml version="1.0" encoding="utf-8"?>
<fo:root xmlns:fo="http://www.w3.org/1999/XSL/Format"
font-size="6pt">
<fo:layout-master-set> <co xml:id="programlisting_fobodyreg_masterset"/>
<fo:simple-page-master master-name="<emphasis role="bold">simplePageLayout</emphasis>" <co
xml:id="programlisting_fobodyreg_simplepagelayout"/>
page-width = "50mm" page-height = "80mm"
margin-top = "5mm" margin-bottom = "20mm"
margin-left = "5mm" margin-right = "10mm">
<fo:region-body <co xml:id="programlisting_fobodyreg_regionbody"/>
margin-top = "10mm" margin-bottom = "5mm"
margin-left = "10mm" margin-right = "5mm"/>
</fo:simple-page-master>
</fo:layout-master-set>
<fo:page-sequence master-reference="<emphasis role="bold">simplePageLayout</emphasis>"> <co
xml:id="programlisting_fobodyreg_pagesequence"/>
<fo:flow flow-name="xsl-region-body"> <co
xml:id="programlisting_fobodyreg_flow"/>
<fo:block space-after="2mm">Dumb text .. dumb text.</fo:block> <co
xml:id="programlisting_fobodyreg_block"/>
<fo:block space-after="2mm">Dumb text .. dumb text.</fo:block> <coref
linkend="programlisting_fobodyreg_block"/>
<fo:block space-after="2mm">Dumb text .. dumb text.</fo:block> <coref
linkend="programlisting_fobodyreg_block"/>
<fo:block space-after="2mm">Dumb text .. dumb text.</fo:block> <coref
linkend="programlisting_fobodyreg_block"/>
</fo:flow>
</fo:page-sequence>
</fo:root></programlisting>
</figure>
<calloutlist>
<callout arearefs="programlisting_fobodyreg_masterset">
<para>As the name suggests multiple layout definitions can appear
here. In this example only one layout is defined.</para>
</callout>
<callout arearefs="programlisting_fobodyreg_simplepagelayout">
<para>Each layout definition carries a key attribute master-name
being unique with respect to all defined layouts appearing in
<emphasis>the</emphasis> <tag
class="starttag">fo:layout-master-set</tag>. We may thus call it a
<emphasis>primary key</emphasis> attribute. The current layout
definition's key has the value <code>simplePageLayout</code>. The
length specifications appearing here are visualized in <xref
linkend="paramRegBodyVisul"/> and correspond to the white
rectangle.</para>
</callout>
<callout arearefs="programlisting_fobodyreg_regionbody">
<para>Each layout definition <emphasis>must</emphasis> have a region
body being the region in which the documents main text flow will
appear. A layout definition <emphasis>may</emphasis> also define
top, bottom and side regions as we will see <link
linkend="paramHeadFoot">later</link>. The body region is shown with
pink background in <xref linkend="paramRegBodyVisul"/>.</para>
</callout>
<callout arearefs="programlisting_fobodyreg_pagesequence">
<para>A <abbrev
xlink:href="http://www.w3.org/TR/2001/REC-xsl-20011015/slice6.html#fo-section">FO</abbrev>
document may have multiple page sequences for example one per each
chapter of a book. It <emphasis>must</emphasis> reference an
<emphasis>existing</emphasis> layout definition via its
<code>master-reference</code> attribute. So we may regard this
attribute as a foreign key targeting the set of all defined layout
definitions.</para>
</callout>
<callout arearefs="programlisting_fobodyreg_flow">
<para>A flow allows us to define in which region output shall
appear. In the current example only one layout containing one region
of type body definition being able to receive text output
exists.</para>
</callout>
<callout arearefs="programlisting_fobodyreg_block">
<para>A <tag class="starttag">fo:block</tag> element may be compared
to a paragraph element <tag class="starttag">p</tag> in HTML. The
attribute <link
xlink:href="http://www.w3.org/TR/xsl/#space-after">space-after</link>="2mm"
adds a space of two mm after each <link
xlink:href="http://www.w3.org/TR/xsl/#fo_block">fo:block</link>
container.</para>
</callout>
</calloutlist>
<para>The result looks like:</para>
<figure xml:id="paramRegBodyVisul">
<title>Parameterizing page- and region view port. All length
dimensions are in mm.</title>
<mediaobject>
<imageobject>
<imagedata fileref="Ref/Fig/overlay.fig"/>
</imageobject>
</mediaobject>
</figure>
</section>
<section xml:id="headFoot">
<title>Headers and footers</title>
<titleabbrev>Header/footer</titleabbrev>
<para>Referring to <xref linkend="foRegionList"/> we now want to add
fixed headers and footers frequently being used for page numbers. In a
textbook each page might have the actual chapter's name in its header.
This name should not change as long as the text below <link
xlink:href="http://www.w3.org/TR/xsl/#fo_region-body">fo:region-body</link>
still belongs to the same chapter. In <abbrev
xlink:href="http://www.w3.org/TR/2001/REC-xsl-20011015/slice6.html#fo-section">FO</abbrev>
this is achieved by:</para>
<itemizedlist>
<listitem>
<para>Encapsulating each chapter's content in a <link
xlink:href="http://www.w3.org/TR/xsl/#fo_page-sequence">fo:page-sequence</link>
of its own.</para>
</listitem>
<listitem>
<para>Defining the desired header text below <link
xlink:href="http://www.w3.org/TR/xsl/#fo_static-content">fo:static-content</link>
in the area defined by <link
xlink:href="http://www.w3.org/TR/xsl/#fo_region-before">fo:region-before</link>.</para>
</listitem>
</itemizedlist>
<para>The notion <link
xlink:href="http://www.w3.org/TR/xsl/#fo_static-content">fo:static-content</link>
refers to the fact that the content is constant (static) within the
given page sequence. The new version reads:</para>
<figure xml:id="paramHeadFoot">
<title>Parameterizing header and footer.</title>
<programlisting language="none"><?xml version="1.0" encoding="utf-8"?>
<fo:root xmlns:fo="http://www.w3.org/1999/XSL/Format"
font-size="6pt">
<fo:layout-master-set>
<fo:simple-page-master master-name="simplePageLayout"
page-width = "50mm" page-height = "80mm"
margin-top = "5mm" margin-bottom = "20mm"
margin-left = "5mm" margin-right = "10mm">
<fo:region-body margin-top = "10mm" margin-bottom = "5mm" <co
xml:id="programlisting_head_foot_bodydef"/>
margin-left = "10mm" margin-right = "5mm"/>
<fo:region-before extent="5mm"/> <co
xml:id="programlisting_head_foot_beforedef"/>
<fo:region-after extent="5mm"/> <co
xml:id="programlisting_head_foot_afterdef"/>
</fo:simple-page-master>
</fo:layout-master-set>
<fo:page-sequence master-reference="simplePageLayout">
<fo:static-content flow-name="xsl-region-before"> <co
xml:id="programlisting_head_foot_beforeflow"/>
<fo:block
font-weight="bold"
font-size="8pt">Headertext</fo:block>
</fo:static-content>
<fo:static-content flow-name="xsl-region-after"> <co
xml:id="programlisting_head_foot_afterflow"/>
<fo:block>
<fo:page-number/>
</fo:block>
</fo:static-content>
<fo:flow flow-name="xsl-region-body">
<fo:block space-after="8mm">Dumb text .. dumb text.</fo:block>
<fo:block space-after="8mm">Dumb text .. dumb text.</fo:block>
<fo:block space-after="8mm">More text .. more text.</fo:block>
<fo:block space-after="8mm">More text .. more text.</fo:block>
<fo:block space-after="8mm">More text .. more text.</fo:block>
</fo:flow>
</fo:page-sequence>
</fo:root></programlisting>
</figure>
<calloutlist>
<callout arearefs="programlisting_head_foot_bodydef">
<para>Defining the body region.</para>
</callout>
<callout arearefs="programlisting_head_foot_beforedef programlisting_head_foot_afterdef">
<para>Defining two regions at the top and bottom of each page. The
<code>extent</code> attribute denotes the height of these regions.
<emphasis>Caveat</emphasis>: The attribute <code>extent</code>'s
value gets subtracted from the <code>margin-top</code> or
<code>margin-bottom</code> value being defined in the corresponding
<tag class="starttag">fo:region-body</tag> element. So if we
consider for example the <tag>fo:region-before</tag> we have to
obey:</para>
<para>extent <= margin-top</para>
<para>Otherwise we may not even see any output.</para>
</callout>
<callout arearefs="programlisting_head_foot_beforeflow">
<para>A <code>fo:static-content</code> denotes text portions which
are decoupled from the <quote>usual</quote> text flow. For example
as a book's chapter advances over multiple pages we expect the
constant chapter's title to appear on top of each page. In the
current example the static string <code>Headertext</code> will
appear on each page's top for the whole <tag
class="starttag">fo:page-sequence</tag> in which it is defined.
Notice the <code>flow-name="xsl-region-after"</code> reference to
the region being defined in <coref
linkend="programlisting_head_foot_beforedef"/>.</para>
</callout>
<callout arearefs="programlisting_head_foot_afterflow">
<para>We do the same here for the page's footer. Instead of static
text we output <tag>fo_page-number</tag> yielding the current page's
number.</para>
<para>This time <code>flow-name="xsl-region-after"</code> references
the region definition in <coref
linkend="programlisting_head_foot_afterdef"/>. Actually the
attribute <code>flow-name</code> is restricted to the following five
values corresponding to all possible region definitions within a
layout:</para>
<informaltable>
<?dbhtml table-width="50%" ?>
<?dbfo table-width="50%" ?>
<tgroup cols="2">
<colspec align="left" colwidth="1*"/>
<colspec align="left" colwidth="1*"/>
<tbody>
<row>
<entry><tag class="starttag">fo:region-body</tag></entry>
<entry>xsl-region-body</entry>
</row>
<row>
<entry><tag class="starttag">fo:region-before</tag></entry>
<entry>xsl-region-before</entry>
</row>
<row>
<entry><tag class="starttag">fo:region-after</tag></entry>
<entry>xsl-region-after</entry>
</row>
<row>
<entry><tag class="starttag">fo:region-start</tag></entry>
<entry>xsl-region-start</entry>
</row>
<row>
<entry><tag class="starttag">fo:region-end</tag></entry>
<entry>xsl-region-end</entry>
</row>
</tbody>
</tgroup>
</informaltable>
</callout>
</calloutlist>
<para>This results in two pages with page numbers 1 and 2:</para>
<mediaobject>
<imageobject>
<imagedata fileref="Ref/Fig/headfoot.fig"/>
</imageobject>
</mediaobject>
<para>The free chapter from <xref linkend="bib_Harold04"/> book contains
additional information on extended <link
xlink:href="http://www.cafeconleche.org/books/bible2/chapters/ch18.html#d1e2250">layout
definitions</link>. The <orgname
xlink:href="http://w3.org">W3C</orgname> as the holder of the FO
standard defines the elements <link
xlink:href="http://www.w3.org/TR/xsl/#fo_layout-master-set">fo:layout-master-set</link>,
<link
xlink:href="http://www.w3.org/TR/xsl/#fo_simple-page-master">fo:simple-page-master</link>
and <link
xlink:href="http://www.w3.org/TR/xsl/#fo_page-sequence">fo:page-sequence</link></para>
</section>
<section xml:id="foContainer">
<title>Important Objects</title>
<section xml:id="fo_block">
<title><code>fo:block</code></title>
<para>The FO standard borrows a lot from the CSS standard. Most
formatting objects may have <link
xlink:href="http://www.w3.org/TR/xsl/#section-N19349-Description-of-Property-Groups">CSS
like properties</link> with similar semantics, some properties have
been added. We take a <link
xlink:href="http://www.w3.org/TR/xsl/#fo_block">fo:block</link>
container as an example:</para>
<figure xml:id="blockInline">
<title>A <link
xlink:href="http://www.w3.org/TR/xsl/#fo_block">fo:block</link> with
a <link
xlink:href="http://www.w3.org/TR/xsl/#fo_inline">fo:inline</link>
descendant.</title>
<mediaobject>
<imageobject>
<imagedata fileref="Ref/Fig/blockprop.fo.pdf"/>
</imageobject>
</mediaobject>
<programlisting language="none">...
<fo:block font-weight='bold'
border-bottom-style='dashed'
border-style='solid'
border='1mm'>A lot of attributes and <fo:inline background-color='black'
color='white'>inverted</fo:inline> text.</fo:block> ...</programlisting>
</figure>
<para>The <link
xlink:href="http://www.w3.org/TR/xsl/#fo_inline">fo:inline</link>
descendant serves as a means to change the <quote>current</quote>
property set. In HTML/CSS this may be achieved by using the
<code>SPAN</code> tag:</para>
<programlisting language="none"><!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<html>
<head>
<title>Blocks/spans and CSS</title>
</head>
<body>
<h1>Blocks/spans and CSS</h1>
<p style="font-weight: bold; border: 1mm;
border-style: solid; border-bottom-style: dashed;"
>A lot of attributes and
<span style="color: white;background-color: black;"
>inverted</span> text.</p>
</body>
</html></programlisting>
<para>Though being encapsulated in an attribute <code>class</code> we
find a one-to-one correspondence between FO and CSS in this case. The
HTML rendering works as expected.<mediaobject>
<imageobject>
<imagedata fileref="Ref/Screen/mozparaspancss.screen.png"/>
</imageobject>
</mediaobject>:</para>
</section>
<section xml:id="fo_list">
<title>Lists</title>
<para>The easiest type of lists are unlabeled (itemized) lists as
being expressed by the <code>UL</code>/<code>LI</code> tags in HTML.
FO allows a much more detailed parametrization regarding indents and
distances between labels and item content. Relevant elements are <link
xlink:href="http://www.w3.org/TR/xsl/#fo_list-block">fo:list-block</link>,
<link
xlink:href="http://www.w3.org/TR/xsl/#fo_list-item">fo:list-item</link>
and <link
xlink:href="http://www.w3.org/TR/xsl/#fo_list-item-body">fo:list-item-body</link>.
The drawback is a more complex setup for <quote>default</quote>
lists:</para>
<figure xml:id="listItemize">
<title>An itemized list and result.</title>
<programlisting language="none">...
<fo:list-block
provisional-distance-between-starts="2mm">
<fo:list-item>
<fo:list-item-label end-indent="label-end()">
<fo:block>&#8226;</fo:block>
</fo:list-item-label>
<fo:list-item-body start-indent="body-start()">
<fo:block>Flowers</fo:block>
</fo:list-item-body>
</fo:list-item>
<fo:list-item>
<fo:list-item-label end-indent="label-end()">
<fo:block>&#8226;</fo:block>
</fo:list-item-label>
<fo:list-item-body start-indent="body-start()">
<fo:block>Animals</fo:block>
</fo:list-item-body>
</fo:list-item>
</fo:list-block> ...</programlisting>
<mediaobject>
<imageobject>
<imagedata align="left" fileref="Ref/Fig/itemize.fo.pdf"/>
</imageobject>
</mediaobject>
</figure>
<para>The result looks somewhat primitive in relation to the amount of
source code it necessitates. The power of these constructs shows up
when trying to format nested lists of possibly different types like
enumerations or definition lists under the requirement of
typographical excellence. More complex examples are presented in <link
xlink:href="http://www.cafeconleche.org/books/bible2/chapters/ch18.html#d1e4979">Xmlbible
book</link> of <xref linkend="bib_Harold04"/>.</para>
</section>
<section xml:id="leaderRule">
<title>Leaders and rules</title>
<titleabbrev>Leaders/rules</titleabbrev>
<para>Sometimes adjustable horizontal space between two neighbouring
objects has to be filled e.g. in a book's table of contents. The <link
xlink:href="http://www.w3.org/TR/xsl/#fo_leader">fo:leader</link>
serves this purpose:</para>
<figure xml:id="leaderToc">
<title>Two simulated entries in a table of contents.</title>
<programlisting language="none">...
<fo:block text-align-last='justify'>Valid
XML<fo:leader leader-pattern="dots"/>
page 7</fo:block>
<fo:block text-align-last='justify'>XSL
<fo:leader leader-pattern='dots'/>
page 42</fo:block> ...</programlisting>
<mediaobject>
<imageobject>
<imagedata align="left" fileref="Ref/Fig/leader.fo.pdf"/>
</imageobject>
</mediaobject>
</figure>
<para>The attributes' value <link
xlink:href="http://www.w3.org/TR/xsl/#text-align-last">text-align-last</link>
= <code>'justify'</code> forces the <link
xlink:href="http://www.w3.org/TR/xsl/#fo_block">fo:block</link> to
extend to the available width of the current <link
xlink:href="http://www.w3.org/TR/xsl/#fo_region-body">fo:region-body</link>
area. The <link
xlink:href="http://www.w3.org/TR/xsl/#fo_leader">fo:leader</link>
inserts the necessary amount of content of the specified type defined
in in <link
xlink:href="http://www.w3.org/TR/xsl/#leader-pattern">leader-pattern</link>
to fill up the gap between its neighbouring components. This principle
can be extended to multiple objects:</para>
<figure xml:id="leaderMulti">
<title>Four entries separated by equal amounts of dotted
space.</title>
<programlisting language="none"><fo:block text-align-last='justify'>A<fo:leader
leader-pattern="dots"/>B<fo:leader
leader-pattern="dots"/>C<fo:leader leader-pattern="dots"/>D</fo:block></programlisting>
<mediaobject>
<imageobject>
<imagedata fileref="Ref/Fig/leadermulti.fo.pdf"/>
</imageobject>
</mediaobject>
</figure>
<para>A <link
xlink:href="http://www.w3.org/TR/xsl/#fo_leader">fo:leader</link> may
also be used to draw horizontal lines to separate objects. In this
case there are no neighbouring components within the
<quote>current</quote> line in which the <link
xlink:href="http://www.w3.org/TR/xsl/#fo_leader">fo:leader</link>
appears. This is frequently used to draw a border between
<code>xsl-region-body</code> and <code>xsl-region-before</code> and/or
<code>xsl-region-after</code>:</para>
<figure xml:id="leaderSeparate">
<title>A horizontal line separator between header and body of a
page.</title>
<programlisting language="none">...
<fo:page-sequence master-reference="simplePageLayout">
<fo:static-content flow-name="xsl-region-before">
<fo:block text-align-last='justify'>FO<fo:leader/>page 5</fo:block>
<fo:block text-align-last='justify'>
<fo:leader leader-pattern="rule" leader-length="100%"/>
</fo:block>
</fo:static-content>
<fo:flow flow-name="xsl-region-body">
<fo:block>Some body text ...</fo:block>
</fo:flow>
</fo:page-sequence>...</programlisting>
<mediaobject>
<imageobject>
<imagedata align="left" fileref="Ref/Fig/separate.fo.pdf"/>
</imageobject>
</mediaobject>
</figure>
<para>Note the empty leader <code><</code> <link
xlink:href="http://www.w3.org/TR/xsl/#fo_leader">fo:leader</link>
<code>/></code> between the <quote> <code>FO</code> </quote> and
the <quote>page 5</quote> text node inserting horizontal whitespace to
get the page number centered to the header's right edge. This is in
accordance with the <link
xlink:href="http://www.w3.org/TR/xsl/#leader-pattern">leader-pattern</link>
attributes default value <code>space</code>.</para>
</section>
<section xml:id="pageNumbering">
<title>Page numbers</title>
<para>We already saw an example of page numbering via <link
xlink:href="http://www.w3.org/TR/xsl/#fo_page-number">fo:page-number</link>
in <xref linkend="paramHeadFoot"/>. Sometimes a different style for
page numbering is desired. The default page numbering style may be
changed by means of the <link
xlink:href="http://www.w3.org/TR/xsl/#fo_page-sequence">fo:page-sequence</link>
element's attribute <link
xlink:href="http://www.w3.org/TR/xsl/#format">format</link>. For a
closer explanation the <link
xlink:href="http://www.w3.org/TR/2007/REC-xslt20-20070123/#convert">W3X
XSLT standards documentation</link> may be consulted:</para>
<figure xml:id="pageNumberingRoman">
<title>Roman style page numbers.</title>
<programlisting language="none">...
<fo:page-sequence format="i"
master-reference="simplePageLayout">
<fo:static-content
flow-name="xsl-region-after">
<fo:block text-align-last='justify'>
<fo:leader leader-pattern="rule"
leader-length="100%"/>
</fo:block>
<fo:block font-weight="bold">
<fo:page-number/>
</fo:block>
</fo:static-content>
<fo:flow flow-name="xsl-region-body">
<fo:block>Some text...</fo:block>
<fo:block>More text, more text,
more text.</fo:block>
<fo:block>More text, more text,
more text.</fo:block>
<fo:block>Enough text.</fo:block>
</fo:flow>
</fo:page-sequence> ...</programlisting>
<mediaobject>
<imageobject>
<imagedata align="left" fileref="Ref/Fig/pageStack.fig"/>
</imageobject>
</mediaobject>
</figure>
</section>
<section xml:id="foMarker">
<title>Marker</title>
<figure xml:id="dictionary">
<title>A dictionary with running page headers.</title>
<programlisting language="none">...
<fo:page-sequence
master-reference="simplePageLayout">
<fo:static-content flow-name="xsl-region-before">
<fo:block font-weight="bold">
<fo:retrieve-marker retrieve-class-name="alpha"
retrieve-position="first-starting-within-page"
/>-<fo:retrieve-marker
retrieve-position="last-starting-within-page"
retrieve-class-name="alpha"/>
</fo:block>
<fo:block text-align-last='justify'>
<fo:leader leader-pattern="rule" leader-length="100%"/></fo:block>
</fo:static-content>
<fo:flow flow-name="xsl-region-body">
<fo:block>
<fo:marker marker-class-name="alpha">A
</fo:marker>Ant</fo:block>
<fo:block>
<fo:marker marker-class-name="alpha">B
</fo:marker>Bug</fo:block>
<fo:block>
<fo:marker marker-class-name="alpha">L
</fo:marker>Lion</fo:block>
<fo:block>
<fo:marker marker-class-name="alpha">N
</fo:marker>Nose</fo:block>
<fo:block>
<fo:marker marker-class-name="alpha">P
</fo:marker>Peg</fo:block>
</fo:flow>
</fo:page-sequence> ...</programlisting>
<mediaobject>
<imageobject>
<imagedata align="left" fileref="Ref/Fig/dictionaryStack.fig"/>
</imageobject>
</mediaobject>
</figure>
</section>
<section xml:id="foIntRef">
<title>Internal references</title>
<titleabbrev>References</titleabbrev>
<para>Regarding printed documents we may define two categories of
document internal references:</para>
<variablelist>
<varlistentry>
<term><emphasis>Page number references</emphasis></term>
<listitem>
<para>This is the <quote>classical</quote> type of a reference
e.g. in books. An author refers the reader to a distant location
by writing <quote>... see further explanation in section 4.5 on
page 234</quote>. A book's table of contents assigning page
numbers to topics is another example. This way the
implementation of a reference relies solely on the features a
printed document offers.</para>
</listitem>
</varlistentry>
<varlistentry>
<term><emphasis>Hypertext references</emphasis></term>
<listitem>
<para>This way of implementing references utilizes features of
(online) viewers for printable documents. For example PDF
viewers like <productname
xlink:href="http://www.adobe.com">Adobe's Acrobat
reader</productname> or the evince application are able to
follow hypertext links in a fashion known from HTML browsers.
This browser feature is based on hypertext capabilities defined
in the Adobe's PDF de-facto standard.</para>
</listitem>
</varlistentry>
</variablelist>
<para>Of course the second type of references is limited to people who
use an online viewer application instead of reading a document from
physical paper.</para>
<para>We now show the implementation of <abbrev
xlink:href="http://www.w3.org/TR/2001/REC-xsl-20011015/slice6.html#fo-section">FO</abbrev>
based page references. As already being discussed for <link
xlink:href="http://www.w3.org/TR/xml#id">ID</link> / <link
xlink:href="http://www.w3.org/TR/xml#idref">IDREF</link> pairs we need
a link destination (anchor) and a link source. The <abbrev
xlink:href="http://www.w3.org/TR/2001/REC-xsl-20011015/slice6.html#fo-section">FO</abbrev>
standard uses the same anchor implementation as in XML for <link
xlink:href="http://www.w3.org/TR/xml#id">ID</link> typed attributes:
<abbrev
xlink:href="http://www.w3.org/TR/2001/REC-xsl-20011015/slice6.html#fo-section">FO</abbrev>
objects <emphasis>may</emphasis> have an attribute <link
xlink:href="http://www.w3.org/TR/xsl/#id">id</link> with a document
wide unique value. The <abbrev
xlink:href="http://www.w3.org/TR/2001/REC-xsl-20011015/slice6.html#fo-section">FO</abbrev>
element <link
xlink:href="http://www.w3.org/TR/xsl/#fo_page-number-citation">fo:page-number-citation</link>
is used to actually create a page reference via its attribute <link
xlink:href="http://www.w3.org/TR/xsl/#ref-id">ref-id</link>:</para>
<figure xml:id="refJavaXml">
<title>Two blocks mutual page referencing each other.</title>
<programlisting language="none">...
<fo:flow flow-name='xsl-region-body'>
<fo:block id='xml'>Java section see page
<fo:page-number-citation ref-id='java'/>.
</fo:block>
<fo:block id='java'>XML section see page
<fo:page-number-citation ref-id='xml'/>.
</fo:block>
</fo:flow> ...</programlisting>
<mediaobject>
<imageobject>
<imagedata align="left" fileref="Ref/Fig/pagerefStack.fig"/>
</imageobject>
</mediaobject>
</figure>
<para>NB: Be careful defining <link
xlink:href="http://www.w3.org/TR/xsl/#id">id</link> attributes for
objects being descendants of <link
xlink:href="http://www.w3.org/TR/xsl/#fo_static-content">fo:static-content</link>
nodes. Such objects typically appear on multiple pages and are
therefore no unique anchors. A reference carrying such an id value
thus actually refers to 1 <= n values on n different pages.
Typically a user agent will choose the first object of this set when
clicking the link. So in effect the parent <link
xlink:href="http://www.w3.org/TR/xsl/#fo_page-sequence">fo:page-sequence</link>
is chosen as the effective link target.</para>
<para>The element <link
xlink:href="http://www.w3.org/TR/xsl/#fo_basic-link">fo:basic-link</link>
creates PDF hypertext links. We extend the previous example:</para>
<figure xml:id="refJavaXmlHyper">
<title>Two blocks with mutual page- and hypertext
references.</title>
<programlisting language="none"><fo:flow flow-name='xsl-region-body'>
<fo:block id='xml'>Java section see <fo:basic-link color="blue"
internal-destination="java">page<fo:page-number-citation
ref-id='java'/>.</fo:basic-link></fo:block>
<fo:block id='java'>XML section see
<fo:basic-link color="blue"
internal-destination="xml">page <fo:page-number-citation
ref-id='xml'/>.</fo:basic-link></fo:block >
</fo:flow></programlisting>
<mediaobject>
<imageobject>
<imagedata fileref="Ref/Fig/pagerefhyperStack.fig"/>
</imageobject>
</mediaobject>
</figure>
</section>
<section xml:id="pdfBookmarks">
<title>PDF bookmarks</title>
<titleabbrev>Bookmarks</titleabbrev>
<para>The PDF specification allows to define so called bookmarks
offering an explorer like navigation:</para>
<mediaobject>
<imageobject>
<imagedata fileref="Ref/Screen/pdfbookmarks.screen.png"/>
</imageobject>
</mediaobject>
<para>PDF bookmarks are <link
xlink:href="http://www.w3.org/TR/2006/REC-xsl11-20061205/#d0e14206">part
of the XSL-FO 1.1</link> Standard. Some <abbrev
xlink:href="http://www.w3.org/TR/2001/REC-xsl-20011015/slice6.html#fo-section">FO</abbrev>
processors still continue to use proprietary solutions for bookmark
creation with respect to the older <abbrev
xlink:href="http://www.w3.org/TR/2001/REC-xsl-20011015/slice6.html#fo-section">FO</abbrev>
1.0 standard. For details of bookmark extensions by
<orgname>RenderX</orgname>'s processor see <link
xlink:href="http://www.renderx.com/tutorial.html#PDF_Bookmarks">xep's
documentation</link>.</para>
</section>
</section>
<section xml:id="xml2fo">
<title>Constructing <abbrev
xlink:href="http://www.w3.org/TR/2001/REC-xsl-20011015/slice6.html#fo-section">FO</abbrev>
from XML documents</title>
<titleabbrev><abbrev
xlink:href="http://www.w3.org/TR/2001/REC-xsl-20011015/slice6.html#fo-section">FO</abbrev>
from XML</titleabbrev>
<para>So far we have learnt some basic <abbrev
xlink:href="http://www.w3.org/TR/2001/REC-xsl-20011015/slice6.html#fo-section">FO</abbrev>
elements. As with HTML we typically generate FO code from other sources
rather than crafting it by hand. The general picture is:</para>
<figure xml:id="htmlFoProduction">
<title>Different target formats from common source.</title>
<mediaobject>
<imageobject>
<imagedata fileref="Ref/Fig/crossmedia.fig" scale="65"/>
</imageobject>
<caption>
<para>We may generate both online and printed documentation from a
common source. This requires style sheets for the desired
destination formats in question.</para>
</caption>
</mediaobject>
</figure>
<para>We discussed the <abbrev
xlink:href="http://www.w3.org/TR/2001/REC-xsl-20011015/slice6.html#fo-section">FO</abbrev>
standard as an input format for printable output production by a
renderer. In this way a <abbrev
xlink:href="http://www.w3.org/TR/2001/REC-xsl-20011015/slice6.html#fo-section">FO</abbrev>
document is similar to HTML being a format to be rendered by a web
browser for visual (screen oriented) output production. The
transformation from a XML source (e.g. a memo document) to <abbrev
xlink:href="http://www.w3.org/TR/2001/REC-xsl-20011015/slice6.html#fo-section">FO</abbrev>
is still missing. As for HTML we may use <abbrev
xlink:href="http://www.w3.org/Style/XSL">XSL</abbrev> as a
transformation means. We generate the sender's surname from a memo
document instance:</para>
<figure xml:id="memo2fosurname">
<title>Generating a sender's surname for printing.</title>
<programlisting language="none"><?xml version="1.0" encoding="utf-8"?>
<xsl:stylesheet version="1.0"
xmlns:fo="http://www.w3.org/1999/XSL/Format"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="xml" indent="yes"/>
<xsl:template match="/">
<fo:root>
<fo:layout-master-set>
<fo:simple-page-master master-name="simplePageLayout"
page-width="294mm" page-height="210mm" margin="5mm">
<fo:region-body margin="15mm"/>
</fo:simple-page-master>
</fo:layout-master-set>
<fo:page-sequence master-reference="simplePageLayout">
<fo:flow flow-name="xsl-region-body">
<fo:block font-size="20pt">
<xsl:text>Sender:</xsl:text>
<fo:inline font-weight='bold'>
<xsl:value-of select="memo/from/surname"/>
</fo:inline>
</fo:block>
</fo:flow>
</fo:page-sequence>
</fo:root>
</xsl:template>
</xsl:stylesheet></programlisting>
</figure>
<para>A suitable XML document instance reads:</para>
<figure xml:id="memoMessage">
<title>A <code>memo</code> document instance.</title>
<programlisting language="none"><memo ...="memo.xsd">
<from>
<name>Martin</name>
<surname>Goik</surname>
</from>
<to>
<name>Adam</name>
<surname>Hacker</surname>
</to>
<to>
<name>Eve</name>
<surname>Intruder</surname>
</to>
<date year="2005" month="1" day="6"/>
<subject>Firewall problems</subject>
<content>
<para>Thanks for your excellent work.</para>
<para>Our firewall is definitely broken!</para>
</content>
</memo></programlisting>
</figure>
<para>Some remarks:</para>
<orderedlist>
<listitem>
<para>The <link
xlink:href="http://www.w3.org/TR/2007/REC-xslt20-20070123/#element-stylesheet">xsl_stylesheet</link>
element contains a namespace definition for the target FO document's
namespace, namely:</para>
<programlisting language="none">xmlns:xsl="http://www.w3.org/1999/XSL/Transform"</programlisting>
<para>This is required to use elements like <link
xlink:href="http://www.w3.org/TR/xsl/#fo_block">fo:block</link>
belonging to the FO namespace.</para>
</listitem>
<listitem>
<para>The option value <code>indent="yes"</code> in <link
xlink:href="http://www.w3.org/TR/2007/REC-xslt20-20070123/#element-output">xsl_output</link>
is usually set to "no" in a production environment to avoid
whitespace related problems.</para>
</listitem>
<listitem>
<para>The generation of a print format like PDF is actually a two
step process. To generate message.pdf from message.xml by a
stylesheet memo2fo.xsl we need the following calls:</para>
<variablelist>
<varlistentry>
<term><emphasis>XML document instance to FO</emphasis></term>
<listitem>
<programlisting language="none">xml2xml message.xml memo2fo.xsl -o message.fo</programlisting>
</listitem>
</varlistentry>
<varlistentry>
<term><emphasis>FO to PDF</emphasis></term>
<listitem>
<programlisting language="none">fo2pdf -fo message.fo -pdf message.pdf</programlisting>
</listitem>
</varlistentry>
</variablelist>
<mediaobject>
<imageobject>
<imagedata fileref="Ref/Fig/xml2fo2pdf.fig"/>
</imageobject>
</mediaobject>
<para>When debugging of the intermediate <abbrev
xlink:href="http://www.w3.org/TR/2001/REC-xsl-20011015/slice6.html#fo-section">FO</abbrev>
file is not required both steps may be combined into a single
call:</para>
<programlisting language="none">fo2pdf -xml message.xml -xsl memo2fo.xsl -pdf message.pdf</programlisting>
</listitem>
</orderedlist>
</section>
<section xml:id="foCatalog">
<title>Formatting a catalog.</title>
<titleabbrev>A catalog</titleabbrev>
<para>We now take the <link linkend="climbingCatalog">climbing catalog
example</link> with prices being added and incrementally create a series
of PDF versions improving from one version to another.</para>
<qandaset defaultlabel="qanda" xml:id="idCatalogStart">
<title>A first PDF version of the catalog</title>
<qandadiv>
<qandaentry>
<question>
<para>Write a <abbrev
xlink:href="http://www.w3.org/Style/XSL">XSL</abbrev> script to
generate a starting version <filename
xlink:href="Ref/src/Dom/climbenriched.start.pdf">climbenriched.start.pdf</filename>.</para>
</question>
<answer>
<programlisting language="none"><?xml version="1.0" encoding="utf-8"?>
<xsl:stylesheet version="1.0"
xmlns:fo="http://www.w3.org/1999/XSL/Format"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="xml" indent="yes"/>
<xsl:template match="/">
<fo:root font-size="10pt">
<fo:layout-master-set>
<fo:simple-page-master master-name="productPage"
page-width="80mm" page-height="110mm" margin="5mm">
<fo:region-body margin="15mm"/>
<fo:region-before extent="10mm"/>
</fo:simple-page-master>
</fo:layout-master-set>
<xsl:apply-templates select="catalog/product" />
</fo:root>
</xsl:template>
<xsl:template match="product">
<fo:page-sequence master-reference="productPage">
<fo:static-content flow-name="xsl-region-before">
<fo:block font-weight="bold">
<xsl:value-of select="title"/>
</fo:block>
</fo:static-content>
<fo:flow flow-name="xsl-region-body">
<xsl:apply-templates select="description/para"/>
<fo:block>Price:<xsl:value-of select="@price"/></fo:block>
<fo:block>Order no:<xsl:value-of select="@id"/></fo:block>
</fo:flow>
</fo:page-sequence>
</xsl:template>
<xsl:template match="para">
<fo:block space-after="10px">
<xsl:value-of select="."/>
</fo:block>
</xsl:template>
</xsl:stylesheet></programlisting>
</answer>
</qandaentry>
<qandaentry xml:id="idCatalogProduct">
<question>
<label>Header, page numbers and table formatting</label>
<para>Extend <xref linkend="idCatalogStart"/> by adding page
numbers. The order number and prices shall be formatted as
tables. Add a ruler to each page's head. The result should look
like <filename
xlink:href="Ref/src/Dom/climbenriched.product.pdf">climbenriched.product.pdf</filename></para>
</question>
<answer>
<para>Solution see <filename
xlink:href="Ref/src/Dom/catalog2fo.product.xsl">catalog2fo.product.xsl</filename>.</para>
</answer>
</qandaentry>
<qandaentry xml:id="idCatalogToc">
<question>
<label>A table of contents.</label>
<para>Each product description's page number shall appear in a
table of contents together with the product's <code>title</code>
as in <filename
xlink:href="Ref/src/Dom/climbenriched.toc.pdf">climbenriched.toc.pdf</filename>.</para>
</question>
<answer>
<para>Solution see <filename
xlink:href="Ref/src/Dom/catalog2fo.toc.xsl">catalog2fo.toc.xsl</filename>.</para>
</answer>
</qandaentry>
<qandaentry xml:id="idCatalogToclink">
<question>
<label>A table of contents with hypertext links.</label>
<para>The table of contents' entries may offer hypertext
features to supporting browsers as in <filename
xlink:href="Ref/src/Dom/climbenriched.toclink.pdf">climbenriched.toclink.pdf</filename>.
In addition include the document's <tag
class="starttag">introduction</tag>.</para>
</question>
<answer>
<para>Solution see <filename
xlink:href="Ref/src/Dom/catalog2fo.toclink.xsl">catalog2fo.toclink.xsl</filename>.</para>
</answer>
</qandaentry>
<qandaentry xml:id="idCatalogFinal">
<question>
<label>A final version.</label>
<para>Add the following features:</para>
<orderedlist>
<listitem>
<para>Number the table of contents starting with page i, ii,
iii, iv and so on. Start the product descriptions with page
1. On each page's footer a text <quote>page xx of yy</quote>
shall be displayed. This requires the definition of an
anchor <code>id</code> on the <abbrev
xlink:href="http://www.w3.org/TR/2001/REC-xsl-20011015/slice6.html#fo-section">FO</abbrev>
document's last page.</para>
</listitem>
<listitem>
<para>Add PDF bookmarks by using <orgname>XEP</orgname>'s
<abbrev
xlink:href="http://www.w3.org/TR/2001/REC-xsl-20011015/slice6.html#fo-section">FO</abbrev>
extensions. This requires the namespace declaration
<code>xmlns:rx="http://www.renderx.com/XSL/Extensions"</code>
in the XSLT script's header.</para>
</listitem>
</orderedlist>
<para>The result may look like <filename
xlink:href="Ref/src/Dom/climbenriched.final.pdf">climbenriched.final.pdf</filename>.
N.B.: It may take some effort to achieve this result. This
effort is left to the <emphasis>interested</emphasis>
participants.</para>
</question>
<answer>
<para>Solution see <filename
xlink:href="Ref/src/Dom/catalog2fo.toclink.xsl">catalog2fo.toclink.xsl</filename>.</para>
</answer>
</qandaentry>
</qandadiv>
</qandaset>
</section>
</chapter>
<appendix>
<title>W3C production rules</title>
<productionset>
<title><link
xlink:href="http://www.w3.org/TR/2008/REC-xml-20081126/#charsets">Characters</link></title>
<production xml:id="w3RecXml_NT-Letter">
<lhs>Letter</lhs>
<rhs><nonterminal def="#w3RecXml_NT-BaseChar">BaseChar</nonterminal> |
<nonterminal
def="#w3RecXml_NT-Ideographic">Ideographic</nonterminal></rhs>
</production>
<production xml:id="w3RecXml_NT-BaseChar">
<lhs>BaseChar</lhs>
<rhs>[#x0041-#x005A] | [#x0061-#x007A] | [#x00C0-#x00D6]
| [#x00D8-#x00F6] | [#x00F8-#x00FF] | [#x0100-#x0131]
| [#x0134-#x013E] |...(values omitted here, see W3C
documentation)</rhs>
</production>
<production xml:id="w3RecXml_NT-Ideographic">
<lhs>Ideographic</lhs>
<rhs>[#x4E00-#x9FA5] | #x3007 | [#x3021-#x3029]</rhs>
</production>
<production xml:id="w3RecXml_NT-CombiningChar">
<lhs>CombiningChar</lhs>
<rhs>[#x0300-#x0345] | ...(values omitted here)</rhs>
</production>
<production xml:id="w3RecXml_NT-Digit">
<lhs>Digit</lhs>
<rhs>[#x0030-#x0039] | [#x0660-#x0669] | [#x06F0-#x06F9]
| [#x0966-#x096F] | [#x09E6-#x09EF] | [#x0A66-#x0A6F]
| [#x0AE6-#x0AEF] | [#x0B66-#x0B6F] | [#x0BE7-#x0BEF]
| [#x0C66-#x0C6F] | [#x0CE6-#x0CEF] | [#x0D66-#x0D6F]
| [#x0E50-#x0E59] | [#x0ED0-#x0ED9] | [#x0F20-#x0F29]</rhs>
</production>
<production xml:id="w3RecXml_NT-Extender">
<lhs>Extender</lhs>
<rhs>#x00B7 | #x02D0 | #x02D1 | #x0387 | #x0640 | #x0E46 | #x0EC6
| #x3005 | [#x3031-#x3035] | [#x309D-#x309E] | [#x30FC-#x30FE]</rhs>
</production>
</productionset>
</appendix>
<appendix>
<title>Glossary</title>
<para/>
<glossary>
<glossentry xml:id="gloss_API">
<glossterm><abbrev xlink:href="http://en.wikipedia.org/wiki/Api"
xml:id="abbr_api">API</abbrev></glossterm>
<glossdef>
<para>Application programming interface</para>
</glossdef>
</glossentry>
<glossentry xml:id="gloss_SqlDdl">
<glossterm><abbrev
xlink:href="http://en.wikipedia.org/wiki/Data_definition_language"
xml:id="abbr_Ddl">DDL</abbrev> <link
linkend="gloss_SQL">(SQL)</link></glossterm>
<glossdef>
<para>Data definition language. The subset of <link
linkend="gloss_SQL">SQL</link> dealing with the creation of tables,
views etc.</para>
</glossdef>
</glossentry>
<glossentry xml:id="gloss_DOM">
<glossterm><acronym xlink:href="http://www.w3.org/DOM"
xml:id="abbr_Dom">DOM</acronym></glossterm>
<glossdef>
<para>The <link linkend="gloss_W3C">W3C</link> <link
xlink:href="http://www.w3.org/DOM">Document Object Model</link>
standard</para>
</glossdef>
</glossentry>
<glossentry xml:id="gloss_DTD">
<glossterm><abbrev
xlink:href="http://en.wikipedia.org/wiki/Document_Type_Declaration"
xml:id="abbr_Dtd">DTD</abbrev></glossterm>
<glossdef>
<para>Document Type Definition. An older standard with respect to
<link linkend="gloss_RelaxNG">RelaxNG</link> and <link
linkend="gloss_RelaxNG">XML schema</link> to define an XML documents
grammar.</para>
</glossdef>
</glossentry>
<glossentry xml:id="gloss_EBNF">
<glossterm><abbrev>EBNF</abbrev></glossterm>
<glossdef>
<para>Extended Backus-Naur form.</para>
</glossdef>
</glossentry>
<glossentry xml:id="gloss_ftp">
<glossterm><abbrev
xlink:href="http://en.wikipedia.org/wiki/File_Transfer_Protocol"
xml:id="abbr_Ftp">ftp</abbrev></glossterm>
<glossdef>
<para>File Transfer Protocol</para>
</glossdef>
</glossentry>
<glossentry xml:id="gloss_FO">
<glossterm><abbrev
xlink:href="http://www.w3.org/TR/2001/REC-xsl-20011015/slice6.html#fo-section"
xml:id="abbr_Fo">FO</abbrev></glossterm>
<glossdef>
<para>The Formatting Objects Standard for printable output
generation</para>
</glossdef>
</glossentry>
<glossentry xml:id="gloss_HDM">
<glossterm><orgname xlink:href="http://www.hdm-stuttgart.de"
xml:id="org_Hdm">Hdm</orgname></glossterm>
<glossdef>
<para xml:lang="de">Hochschule der Medien.</para>
</glossdef>
</glossentry>
<glossentry xml:id="gloss_Hql">
<glossterm><abbrev
xlink:href="http://docs.jboss.org/hibernate/orm/4.1/manual/en-US/html/ch16.html"
xml:id="abbr_Hql">HQL</abbrev></glossterm>
<glossdef>
<para>The <link
xlink:href="http://docs.jboss.org/hibernate/orm/3.3/reference/en/html/queryhql.html">Hibernate
Query Language</link>.</para>
</glossdef>
</glossentry>
<glossentry xml:id="gloss_http">
<glossterm><abbrev xlink:href="http://www.w3.org/Protocols"
xml:id="abbr_Http">http</abbrev></glossterm>
<glossdef>
<para>The Hypertext Transfer Protocol</para>
</glossdef>
</glossentry>
<glossentry xml:id="gloss_IDE">
<glossterm><abbrev
xlink:href="http://en.wikipedia.org/wiki/Integrated_development_environment"
xml:id="abbr_Ide">IDE</abbrev></glossterm>
<glossdef>
<para>Integrated Development Environment</para>
</glossdef>
</glossentry>
<glossentry xml:id="gloss_J2EE">
<glossterm><trademark
xlink:href="http://www.oracle.com/technetwork/java/javaee"
xml:id="tm_J2ee">J2EE</trademark></glossterm>
<glossdef>
<para>Java Platform, Enterprise Edition</para>
</glossdef>
</glossentry>
<glossentry xml:id="gloss_Java">
<glossterm><trademark
xlink:href="http://www.oracle.com/us/legal/third-party-trademarks/index.html">Java</trademark></glossterm>
<glossdef>
<para>General purpose programming language with support for object
oriented concepts.</para>
</glossdef>
</glossentry>
<glossentry xml:id="gloss_Javadoc">
<glossterm><trademark
xlink:href="http://docs.oracle.com/javase/1.5.0/docs/guide/javadoc">Javadoc</trademark></glossterm>
<glossdef>
<para>Extracting documentation embedded in <link
linkend="gloss_Java"><trademark>Java</trademark></link> source
code.</para>
</glossdef>
</glossentry>
<glossentry xml:id="gloss_JDBC">
<glossterm><trademark
xlink:href="http://www.oracle.com/technetwork/java/javase/jdbc"
xml:id="tm_Jdbc">JDBC</trademark></glossterm>
<glossdef>
<para>XXX.</para>
</glossdef>
</glossentry>
<glossentry xml:id="gloss_JDK">
<glossterm><trademark
xlink:href="http://www.oracle.com/technetwork/java/javase"
xml:id="tm_Jdk">JDK</trademark></glossterm>
<glossdef>
<para>Java Development Kit.</para>
</glossdef>
</glossentry>
<glossentry xml:id="gloss_JPA">
<glossterm><abbrev
xlink:href="http://www.javaworld.com/javaworld/jw-01-2008/jw-01-jpa1.html"
xml:id="abbr_Jpa">JPA</abbrev></glossterm>
<glossdef>
<para><link
xlink:href="http://www.javaworld.com/javaworld/jw-01-2008/jw-01-jpa1.html">Java
Persistence Architecture</link></para>
</glossdef>
</glossentry>
<glossentry xml:id="gloss_JRE">
<glossterm><trademark
xlink:href="http://www.oracle.com/technetwork/java/javase"
xml:id="tm_Jre">JRE</trademark></glossterm>
<glossdef>
<para>Java Runtime Environment</para>
</glossdef>
</glossentry>
<glossentry xml:id="gloss_MathML">
<glossterm><abbrev>MathML</abbrev></glossterm>
<glossdef>
<para><link xlink:href="http://www.w3.org/Math">Mathematical Markup
Language</link></para>
</glossdef>
</glossentry>
<glossentry xml:id="gloss_MIB">
<glossterm><orgname xlink:href="http://www.mi.hdm-stuttgart.de"
xml:id="org_Mib">MIB</orgname></glossterm>
<glossdef>
<para xml:lang="de">Bachelor Studiengang Medieninformatik</para>
</glossdef>
</glossentry>
<glossentry xml:id="gloss_Mysql">
<glossterm><trademark
xlink:href="http://www.mysql.com/about/legal/trademark.html"
xml:id="tm_Mysql">Mysql</trademark></glossterm>
<glossdef>
<para>Open source Oracle database product</para>
</glossdef>
</glossentry>
<glossentry xml:id="gloss_MP3">
<glossterm><abbrev>MP3</abbrev></glossterm>
<glossdef>
<para>Audio codec.</para>
</glossdef>
</glossentry>
<glossentry xml:id="gloss_ORM">
<glossterm><abbrev>ORM</abbrev></glossterm>
<glossdef>
<para>Object relational mapping.</para>
</glossdef>
</glossentry>
<glossentry xml:id="gloss_PHP">
<glossterm><abbrev
xlink:href="http://www.php.net">PHP</abbrev></glossterm>
<glossdef>
<para>Hypertext preprocessor</para>
</glossdef>
</glossentry>
<glossentry xml:id="gloss_RelaxNG">
<glossterm><acronym
xlink:href="http://relaxng.org">RelaxNG</acronym></glossterm>
<glossdef>
<para>An <link
xlink:href="http://standards.iso.org/ittf/PubliclyAvailableStandards/c037605_ISO_IEC_19757-2_2003(E).zip">ISO</link>
standard to define the grammar of XML documents. Primary use for
document oriented applications.</para>
</glossdef>
</glossentry>
<glossentry xml:id="gloss_SAX">
<glossterm><acronym
xlink:href="http://www.saxproject.org">SAX</acronym></glossterm>
<glossdef>
<para><link xlink:href="http://www.saxproject.org">Simple API for
XML</link>.</para>
</glossdef>
</glossentry>
<glossentry xml:id="gloss_SQL">
<glossterm><acronym
xlink:href="http://en.wikipedia.org/wiki/Sql">SQL</acronym></glossterm>
<glossdef>
<para><link xlink:href="http://en.wikipedia.org/wiki/SQL">Structured
query language</link>.</para>
</glossdef>
</glossentry>
<glossentry xml:id="gloss_SVG">
<glossterm><abbrev>SVG</abbrev></glossterm>
<glossdef>
<para><link xlink:href="http://www.w3.org/Graphics/SVG">Scalable
Vector Graphics</link>.</para>
</glossdef>
</glossentry>
<glossentry xml:id="gloss_TCP">
<glossterm><acronym
xlink:href="http://en.wikipedia.org/wiki/Transmission_Control_Protocol"
xml:id="abbr_Tcp">TCP</acronym></glossterm>
<glossdef>
<para>Transmission Control Protocol</para>
</glossdef>
</glossentry>
<glossentry xml:id="gloss_URL">
<glossterm><abbrev xlink:href="http://www.ietf.org/rfc/rfc1738.txt"
xml:id="abbr_Url">URL</abbrev></glossterm>
<glossdef>
<para>Uniform Resource Locator</para>
</glossdef>
</glossentry>
<glossentry xml:id="gloss_W3C">
<glossterm><orgname
xlink:href="http://www.w3.org">W3C</orgname></glossterm>
<glossdef>
<para>World Wide Web Consortium</para>
</glossdef>
</glossentry>
<glossentry xml:id="gloss_XHTML">
<glossterm><abbrev>XHTML</abbrev></glossterm>
<glossdef>
<para>Html as <link linkend="gloss_XML">XML</link> <link
xlink:href="http://www.w3.org/TR/xhtml11">standard</link>.</para>
</glossdef>
</glossentry>
<glossentry xml:id="gloss_XML">
<glossterm><abbrev
xlink:href="http://www.w3.org/XML">Xml</abbrev></glossterm>
<glossdef>
<para>The <link xlink:href="http://www.w3.org/XML">Extensible Markup
Language</link>.</para>
</glossdef>
</glossentry>
<glossentry xml:id="gloss_XmlSchema">
<glossterm>XML Schema</glossterm>
<glossdef>
<para>A W3C standard to define grammars for XML documents. Rich set
of features with respect to data modeling.</para>
</glossdef>
</glossentry>
<glossentry xml:id="gloss_XPath">
<glossterm><acronym xlink:href="http://www.w3.org/TR/xpath"
xml:id="abbr_Xpath">XPath</acronym></glossterm>
<glossdef>
<para>XML Path Language</para>
</glossdef>
</glossentry>
<glossentry xml:id="gloss_XSD">
<glossterm><abbrev
xlink:href="http://www.w3.org/Style/XSL">XSD</abbrev></glossterm>
<glossdef>
<para>XML Schema description Language</para>
</glossdef>
</glossentry>
<glossentry xml:id="gloss_XSL">
<glossterm><abbrev xlink:href="http://www.w3.org/Style/XSL"
xml:id="abbr_Xsl">XSL</abbrev></glossterm>
<glossdef>
<para>Extensible Stylesheet Language</para>
</glossdef>
</glossentry>
</glossary>
</appendix>
<xi:include href="../glossary.xml" xpointer="element(/1)"/>
<xi:include href="../bibliography.xml" xpointer="element(/1)"/>
</part>