<?xml version="1.0" encoding="UTF-8"?> <part version="5.0" xml:id="sda1" xmlns="http://docbook.org/ns/docbook" xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:xi="http://www.w3.org/2001/XInclude" xmlns:svg="http://www.w3.org/2000/svg" xmlns:m="http://www.w3.org/1998/Math/MathML" xmlns:html="http://www.w3.org/1999/xhtml" xmlns:db="http://docbook.org/ns/docbook"> <info> <title>Structured Data and Applications 1</title> <author> <personname><firstname>Martin</firstname> <surname>Goik</surname></personname> <affiliation> <orgname>http://medieninformatik.hdm-stuttgart.de</orgname> </affiliation> </author> <legalnotice> <para>Source code available at <uri xlink:href="https://version.mi.hdm-stuttgart.de/git/GoikLectures">https://version.mi.hdm-stuttgart.de/git/GoikLectures</uri></para> </legalnotice> </info> <chapter xml:id="prerequisites"> <title>Prerequisites</title> <section xml:id="resources"> <title>Lecture resources</title> <glosslist> <glossentry> <glossterm>Recommended books</glossterm> <glossdef> <itemizedlist> <listitem> <para><xref linkend="bib_fawcett2012"/></para> </listitem> <listitem> <para><xref linkend="bib_Walmsley02"/></para> </listitem> </itemizedlist> </glossdef> </glossentry> <glossentry> <glossterm>Lecture notes as PDF</glossterm> <glossdef> <para><uri xlink:href="http://www.mi.hdm-stuttgart.de/freedocs/topic/de.hdm_stuttgart.mi.sda1/printversion.pdf">http://www.mi.hdm-stuttgart.de/freedocs/topic/de.hdm_stuttgart.mi.sda1/printversion.pdf</uri></para> <caution> <para>Some figures and videos are left blank.</para> </caution> </glossdef> </glossentry> <glossentry> <glossterm>Live lecture additions</glossterm> <glossdef> <para><link xlink:href="https://cloud.mi.hdm-stuttgart.de/owncloud/public.php?service=files&t=dae5c53f0a05d6661209527cee45d323">https://cloud.mi.hdm-stuttgart.de/owncloud/public.php?service=files&t=dae5c53f0a05d6661209527cee45d323</link></para> </glossdef> </glossentry> <glossentry> <glossterm>List of exercises</glossterm> <glossdef> <para>The lecture notes contain exercises to be solved by you! A complete list is available at <uri xlink:href="http://www.mi.hdm-stuttgart.de/freedocs/topic/de.hdm_stuttgart.mi.sda1/apb.html">http://www.mi.hdm-stuttgart.de/freedocs/topic/de.hdm_stuttgart.mi.sda1/apb.html</uri>.</para> <para>You may also want to use the corresponding PDF version of the above table within <filename xlink:href="http://www.mi.hdm-stuttgart.de/freedocs/topic/de.hdm_stuttgart.mi.sda1/printversion.pdf">printversion.pdf</filename> to keep track of your personal advances by filling in your completion status on individual exercises.</para> </glossdef> </glossentry> <glossentry> <glossterm><link linkend="gloss_Javadoc"><trademark>Javadoc</trademark></link> references and source code</glossterm> <glossdef> <para>The lecture notes contain a lot of <link linkend="gloss_Javadoc"><trademark>Javadoc</trademark></link> references. Most classes appearing within these lecture notes have <link linkend="gloss_Javadoc"><trademark>Javadoc</trademark></link> generated links to the source code as well. For example when clicking on the class name in <classname>sda.jdbc.intro.v1.SimpleInsert</classname> you will see the complete implementation.</para> </glossdef> </glossentry> <glossentry> <glossterm>Links to animated figures</glossterm> <glossdef> <para>The lecture notes' online version contains links to <uri xlink:href="http://www.mi.hdm-stuttgart.de/freedocs/topic/de.hdm_stuttgart.mi.sda1/jdbcWrite.html">PDF images</uri>. Clicking on <quote>Animated PDF Version</quote> takes you to a referenced PDF which in full screen mode of Acrobat Reader or <trademark>google-chrome</trademark> provides a slide like animation.</para> </glossdef> </glossentry> <glossentry> <glossterm><trademark>Virtualbox</trademark> image</glossterm> <glossdef> <para>A <productname xlink:href="https://www.virtualbox.org">Virtualbox</productname> image is available at <uri xlink:href="ftp://mirror.mi.hdm-stuttgart.de/ubuntu/VirtualBox/lubuntu.rar">ftp://mirror.mi.hdm-stuttgart.de/ubuntu/VirtualBox/lubuntu.rar</uri> <link xlink:href="ftp://mirror.mi.hdm-stuttgart.de/ubuntu/VirtualBox/lubuntu.vdi">ftp://mirror.mi.hdm-stuttgart.de/ubuntu/VirtualBox/lubuntu.vdi</link>.</para> <caution> <para>Access from networks being external to <uri>hdm-stuttgart.de</uri> requires <acronym>VPN</acronym> access.</para> </caution> <para>It contains (hopefully) all related tools from the <link xlink:href="http://www.mi.hdm-stuttgart.de">CSM</link> department's lecture room Linux installation:</para> <itemizedlist> <listitem> <para>Eclipse J2EE version with <productname xlink:href="http://www.eclipse.org/datatools">Database developer tools</productname>, <productname xlink:href="http://git-scm.com">git</productname>, <trademark xlink:href="http://oxygenxml.com">Oxygenxml</trademark>, <productname xlink:href="http://testng.org/doc/eclipse.html">TestNG</productname> and <productname xlink:href="http://subversion.apache.org/">svn</productname> plugins installed.</para> </listitem> <listitem> <para>A running <productname xlink:href="http://www.mysql.com/">Mysql</productname> server preconfigured with user <quote><code>hdmuser</code></quote>, password <quote><code>XYZ</code></quote> (<emphasis role="bold">capital letters!</emphasis>) and database <quote><code>hdm</code></quote>.</para> </listitem> <listitem> <para><productname xlink:href="http://www.xmlmind.com/xmleditor">Xmlmind XML editor</productname> for visually editing technical documents based on <productname xlink:href="http://docbook.org/tdg5/index.html">docbook</productname> or <productname xlink:href="http://en.wikipedia.org/wiki/Darwin_Information_Typing_Architecture">DITA</productname>.</para> </listitem> </itemizedlist> <caution> <para>This VM is only accessible from within the <orgname xlink:href="http://www.hdm-stuttgart.de">HdM</orgname> network. External downloads require <productname xlink:href="https://wiki.mi.hdm-stuttgart.de/wiki/VPN">OpenVPN</productname>.</para> </caution> <para>The virtual machine is based on the <productname xlink:href="http://lubuntu.net">Lubuntu</productname> fork of the <productname xlink:href="http://www.ubuntu.com">Ubuntu</productname> Linux distribution for resource saving reasons.</para> </glossdef> </glossentry> <glossentry xml:id="oxygenLicenseKey"> <glossterm><uri>Oxygen Xml Editor</uri> license key</glossterm> <glossdef> <para>This is the only software component in this lecture requiring a license. Your <orgname>HdM</orgname> affiliation entitles you to use the <productname xlink:href="http://oxygenxml.com/">Oxygenxml</productname> software for educational (non-commercial) purposes. The corresponding key is available at <uri xlink:href="ftp://mirror.mi.hdm-stuttgart.de/Firmen/Oxygen/Keys">ftp://mirror.mi.hdm-stuttgart.de/Firmen/Oxygen/Keys</uri>.</para> <para>This license key is compatible both with the standalone and the eclipse plugin version of the product.</para> <caution> <para>The license key's <abbrev xlink:href="http://en.wikipedia.org/wiki/File_Transfer_Protocol">ftp</abbrev> URL is only accessible from within the <orgname xlink:href="http://www.hdm-stuttgart.de">HdM</orgname> network. External access requires <link xlink:href="https://wiki.mi.hdm-stuttgart.de/wiki/VPN">Vpn activation</link>.</para> </caution> </glossdef> </glossentry> <glossentry> <glossterm>Source code of lecture resources</glossterm> <glossdef> <para>The complete lecture sources are available from <link xlink:href="https://version.mi.hdm-stuttgart.de/git/GoikLectures">https://version.mi.hdm-stuttgart.de/git/GoikLectures</link>.</para> <para>You may simply execute <quote><command xlink:href="http://git-scm.com/">git</command> <option>clone</option> <option>https://version.mi.hdm-stuttgart.de/git/GoikLectures</option> <option>.</option></quote> to check out the master tree.</para> </glossdef> </glossentry> <glossentry> <glossterm>Source code of exercises and examples</glossterm> <glossdef> <para>These sources contain a subdirectory <filename>ws/eclipse/Jdbc</filename> which can be imported as an eclipse project. This allows for browsing solutions to the exercises and executing sample applications. Import into eclipse works the following way:</para> <itemizedlist> <listitem> <para>When starting eclipse choose <filename>.../ws/eclipse</filename> as workspace</para> </listitem> <listitem> <para>In eclipse click <quote>File --> Import --> General --> Existing Projects into Workspace</quote>. After re-selecting the current workspace <filename>.../ws/eclipse</filename> the folder <filename>Jdbc</filename> should be on the list of importable projects.</para> <para>Depending on your eclipse installation you may have to adjust the <link linkend="gloss_Java"><trademark>Java</trademark></link> system libraries. Right click on your project root in the package explorer and choose <quote>Build Path --> Configure Buildpath</quote>. The <quote>JRE System Library</quote> entry in the <quote>Libraries</quote> tab may have to be changed to suit your eclipse's installation needs. You may want to create a dummy <link linkend="gloss_Java"><trademark>Java</trademark></link> project to find the correct setting.</para> </listitem> </itemizedlist> </glossdef> </glossentry> </glosslist> </section> <section xml:id="tools"> <title>Tools</title> <para>The subsequent sections describe tools being helpful to successfully carry out the exercises. These descriptions are suitable for current Linux/Ubuntu systems. However these tool are available for <trademark>Windows</trademark> or <trademark>Apple</trademark> systems as well. For the latter some command line hints may have to be replaced by using GUI based tools.</para> <para>You may want to use the <link xlink:href="ftp://mirror.mi.hdm-stuttgart.de/ubuntu/VirtualBox/lubuntu.vdi">corresponding</link> <link xlink:href="https://www.virtualbox.org">Virtualbox image</link> containing a complete system avoiding installation hassles. This should work well one reasonable current hardware systems.</para> <section xml:id="eclipse"> <title><productname xlink:href="http://www.oracle.com/technetwork/java/javase/jdk-7-readme-429198.html">JDK</productname> and Eclipse</title> <para>So you like to take the hard way rather than using <link xlink:href="ftp://mirror.mi.hdm-stuttgart.de/ubuntu/VirtualBox/lubuntu.vdi">the virtualbox image</link>? Good! Real programmers tend to complicate things!</para> <para>The Eclipse IDE will be used as the primary coding tool especially for <link linkend="gloss_Java"><trademark>Java</trademark></link> and XML. Users may use different tools like e.g. <productname xlink:href="http://netbeans.org">Netbeans</productname> or <productname xlink:href="http://www.altova.com/de/xmlspy.html">XML-Spy</productname>. There are however some caveats:</para> <itemizedlist> <listitem> <para>Certain functionalities may not be provided</para> </listitem> <listitem> <para><orgname>HdM</orgname> staff support in case of troubles will be limited to coding excluding tool support. In other words: You are on your own!</para> </listitem> </itemizedlist> <para>Installation of eclipse requires a suitable <link linkend="gloss_Java"><trademark>Java</trademark></link> Development Kit.</para> <caution> <para>Your<productname xlink:href="http://www.oracle.com/technetwork/java/javase/jdk-7-readme-429198.html">JDK</productname> selection may be affected by your system's hardware. On a 64 bit system you may install either a 32 bit or a 64 bit <productname xlink:href="http://www.oracle.com/technetwork/java/javase/jdk-7-readme-429198.html">JDK</productname>. If you subsequently install eclipse you must select the appropriate 32 or 64 Bit version matching your <productname xlink:href="http://www.oracle.com/technetwork/java/javase/jdk-7-readme-429198.html">JDK</productname> choice.</para> </caution> <para>Due to Oracle's (end-user unfriendly) licensing policy you may have to install this component manually. For <productname xlink:href="http://www.ubuntu.com">Ubuntu</productname> and <productname xlink:href="http://www.debian.org">Debian</productname> systems a standard (package manager compatible) procedure is being described at <uri xlink:href="http://www.webupd8.org/2012/01/install-oracle-java-jdk-7-in-ubuntu-via.html">http://www.webupd8.org/2012/01/install-oracle-java-jdk-7-in-ubuntu-via.html</uri>. This boils down to (being executed as user root or preceded by <command>sudo</command> <option>...</option>):</para> <programlisting language="none">add-apt-repository ppa:webupd8team/java apt-get update apt-get install oracle-jdk7-installer</programlisting> <para>During the installation process you will have to accept Oracle's license terms. If you do so this information will be cached and not be asked again for when updating via <command>aptitude </command><option>update</option>;<command>aptitude</command> <option>safe-upgrade</option>. After successful installation when executing <command xlink:href="http://www.oracle.com/us/technologies/java">java</command> <option>-version</option> in a shell you should see something similar to:</para> <programlisting language="none">goik@goiki:~$ <emphasis role="bold">java -version</emphasis> java version "1.7.0_07" Java(TM) SE Runtime Environment (build 1.7.0_07-b10) Java HotSpot(TM) Server VM (build 23.3-b01, mixed mode)</programlisting> <para>The Eclipse IDE comes <link xlink:href="http://www.eclipse.org/downloads">with various flavours</link> depending on which plugins are already being shipped. For our purposes the <quote><productname>Eclipse Classic</productname></quote> <link linkend="gloss_Java"><trademark>Java</trademark></link> edition is sufficient. You may however want to install other flavours like <quote><productname>Eclipse IDE for Java EE Developers</productname></quote> if you require features beyond this course's needs. Remember to download the correct 32 or 64 bit version corresponding to your<productname xlink:href="http://www.oracle.com/technetwork/java/javase/jdk-7-readme-429198.html">JDK</productname>.</para> <para>Follow <uri xlink:href="http://askubuntu.com/questions/26632/how-to-install-eclipse#answer-145018">http://askubuntu.com/questions/26632/how-to-install-eclipse#answer-145018</uri> to install eclipse on your system.</para> </section> <section xml:id="oxygenxmlInstall"> <title><productname xlink:href="http://oxygenxml.com">Oxygenxml</productname> plugin</title> <para>Go to <uri xlink:href="http://www.oxygenxml.com/download_oxygenxml_developer.html?os=Eclipse">http://www.oxygenxml.com/download_oxygenxml_developer.html?os=Eclipse</uri>. You may choose between the <quote>Plugin Update site</quote> and <quote>Plugin zip distribution</quote> installation method. The latter allows for better long term eclipse plugin management and is being described at</para> <para>There are two different ways to install Eclipse plugins:</para> <itemizedlist> <listitem> <para>Use Eclipse's built in Update manager by <link xlink:href="http://www.oxygenxml.com/download_oxygenxml_developer.html?os=Eclipse#eclipse_install_instructions">defining a corresponding update site</link>.</para> </listitem> <listitem> <para>Unzip <filename>com.oxygenxml.developer_XYZ.zip</filename> in a subfolder of <filename>.../eclipse/dropins</filename> and restart eclipse (as root).</para> </listitem> </itemizedlist> <para>See <xref linkend="oxygenLicenseKey"/> for obtaining a license key. You may as well install the standalone version of the Oxygen XML Editor.</para> </section> <section xml:id="erMaster"> <title>ERMaster</title> <para>Visual editing of physical entity relationship diagrams. See <link xlink:href="http://ermaster.sourceforge.net">installation instructions</link> on top of an existing eclipse installation.</para> </section> <section xml:id="testngInstall"> <title><foreignphrase>TestNG</foreignphrase> plugin</title> <para>Some exercises require the TestNG plugin to be installed in the Eclipse IDE. You may proceed in a similar way as in <uri linkend="oxygenxmlInstall">Oxygenxml</uri>. According to <uri xlink:href="http://testng.org/doc/eclipse.html#eclipse-installation">http://testng.org/doc/eclipse.html#eclipse-installation</uri> the Eclipse URL being needed is <quote>http://beust.com/eclipse</quote>.</para> </section> <section xml:id="mysql"> <title><productname xlink:href="http://www.mysql.com">Mysql</productname> Database components</title> <para>We start by installing the <productname xlink:href="http://www.mysql.com">Mysql</productname> server:</para> <programlisting language="none">root@goiki:~# aptitude install mysql-server The following NEW packages will be installed: libdbd-mysql-perl{a} libdbi-perl{a} libnet-daemon-perl{a} libplrpc-perl{a} mysql-client-5.5{a} mysql-server-5.5 0 packages upgraded, 6 newly installed, 0 to remove and 0 not upgraded. Need to get 0 B/17.8 MB of archives. After unpacking 63.2 MB will be used. Do you want to continue? [Y/n/?]</programlisting> <para>Hit <keycap>Y - return</keycap> to start. During the installation you will be asked for the <productname xlink:href="http://www.mysql.com">Mysql</productname> servers <quote>root</quote> (Administrator) password:</para> <programlisting language="none">Package configuration ┌───────────────────────────┤ Configuring mysql-server-5.5 ├────────────────────────────┐ │ While not mandatory, it is highly recommended that you set a password for the MySQL │ │ administrative "root" user. │ │ │ │ If this field is left blank, the password will not be changed. │ │ │ │ New password for the MySQL "root" user: │ │ │ │ ********_____________________________________________________________________________ │ │ │ │ <Ok> │ │ │ └───────────────────────────────────────────────────────────────────────────────────────┘ </programlisting> <para>This has to be entered twice. Keep a <emphasis role="bold">permanent</emphasis> record of this entry. Alternatively set a bookmark to <uri xlink:href="https://help.ubuntu.com/community/MysqlPasswordReset">https://help.ubuntu.com/community/MysqlPasswordReset</uri> for later reference *** and don't blame me! ***.</para> <para>At this point we should be able to connect to our newly installed Server. We create a database <quote>hdm</quote> to be used for our exercises:</para> <programlisting language="none">goik@goiki:~$ mysql -u root -p Enter password: Welcome to the MySQL monitor. Commands end with ; or \g. Your MySQL connection id is 42 Server version: 5.5.24-0ubuntu0.12.04.1 (Ubuntu) Copyright (c) 2000, 2011, Oracle and/or its affiliates. All rights reserved. Oracle is a registered trademark of Oracle Corporation and/or its affiliates. Other names may be trademarks of their respective owners. Type 'help;' or '\h' for help. Type '\c' to clear the current input statement. mysql> <emphasis role="bold">create database hdm;</emphasis> Query OK, 1 row affected (0.00 sec)</programlisting> <para>Following <uri xlink:href="https://dev.mysql.com/doc/refman/5.5/en/adding-users.html">https://dev.mysql.com/doc/refman/5.5/en/adding-users.html</uri> we add a new user and grant full access to the newly created database:</para> <programlisting language="none">goik@goiki:~$ mysql -u root -p Enter password: ... mysql> CREATE USER 'hdmuser'@'localhost' IDENTIFIED BY 'XYZ'; mysql> use hdm; mysql> GRANT ALL PRIVILEGES ON *.* TO 'hdmuser'@'localhost' WITH GRANT OPTION; mysql> FLUSH PRIVILEGES;</programlisting> <para>The next step is optional. The <productname xlink:href="http://www.ubuntu.com">Ubuntu</productname> <productname xlink:href="http://www.mysql.com">Mysql</productname> server default configuration allows connections only via <varname>loopback</varname> interface i.e. <varname>localhost</varname>. If you want your <productname xlink:href="http://www.mysql.com">Mysql</productname> server to listen to the external network interface comment out the bind-address parameter in <filename>/etc/mysql/my.cnf</filename>:</para> <programlisting language="none"># Instead of skip-networking the default is now to listen only on # localhost which is more compatible and is not less secure. # <emphasis role="bold">bind-address = 127.0.0.1</emphasis></programlisting> <para>Since we are dealing with <link linkend="gloss_Java"><trademark>Java</trademark></link> a <trademark xlink:href="http://electronics.zibb.com/trademark/jdbc/29545026">JDBC</trademark> driver is needed to connect Applications to our <productname xlink:href="http://www.mysql.com">Mysql</productname> server:</para> <programlisting language="none">root@goiki:~# aptitude install libmysql-java</programlisting> <para>This provides the file /usr/share/java/mysql-connector-java-5.1.16.jar and two symbolic links:</para> <programlisting language="none">goik@goiki:~$ cd /usr/share/java goik@goiki:/usr/share/java$ ls -al mysql* -rw-r--r-- 1 ... 2011 <emphasis role="bold">mysql-connector-java-5.1.16.jar</emphasis> lrwxrwxrwx 1 ... 2011 <emphasis role="bold">mysql-connector-java.jar -> mysql-connector-java-5.1.16.jar</emphasis> lrwxrwxrwx 1 ... 2011 <emphasis role="bold">mysql.jar -> mysql-connector-java.jar</emphasis></programlisting> </section> </section> <section xml:id="lectureNotes"> <title>Lecture related resources</title> <para>The sources for lecture notes and exercises are available from the <orgname xlink:href="http://www.mi.hdm-stuttgart.de">MIB</orgname> <productname xlink:href="http://git-scm.com">git</productname> repository:</para> <para><uri xlink:href="https://version.mi.hdm-stuttgart.de/git/GoikLectures">https://version.mi.hdm-stuttgart.de/git/GoikLectures</uri></para> <para>Check-out is straightforward:</para> <programlisting language="none">goik@goiki:~$ mkdir StructuredData;cd StructuredData goik@goiki:~/StructuredData$ git clone https://version.mi.hdm-stuttgart.de/git/GoikLectures . Cloning into '.'... remote: Counting objects: 694, done ... Resolving deltas: 100% (296/296), done.</programlisting> <para>After checkout an eclipse workspace holding the complete example source code becomes visible:</para> <programlisting language="none">goik@goiki:~/StructuredData$ cd ws/eclipse goik@goiki:~/StructuredData/ws/eclipse$ ls -al insgesamt 16 drwxr-xr-x 3 goik fb1prof 4096 Nov 8 22:04 . drwxr-xr-x 4 goik fb1prof 4096 Nov 8 22:04 .. -rw-r--r-- 1 goik fb1prof 11 Nov 8 22:04 .gitignore <emphasis role="bold">drwxr-xr-x 6 goik fb1prof 4096 Nov 8 22:04 Jdbc</emphasis></programlisting> <para>The subdirectory <filename>Jdbc</filename> can be imported as an eclipse project via File --> import --> General --> Existing Projects into workspace. This should enable each participant to browse and execute the examples being provided in the lecture notes. It also contains the a <productname xlink:href="http://www.mysql.com">Mysql</productname> driver in Jdbc/lib/mysql-connector-java-5.1.16.jar being required to set up a <trademark xlink:href="http://electronics.zibb.com/trademark/jdbc/29545026">JDBC</trademark> connection.</para> </section> <section xml:id="repeatRelational"> <title>Some notes on relational databases</title> <qandaset defaultlabel="qanda" xml:id="airlineRelationalSchema"> <title>Airlines, airports and flights</title> <qandadiv> <qandaentry> <question> <para>Implement a relational schema describing airlines, flights, airports and their respective relationships:</para> <itemizedlist> <listitem> <para>Airline:</para> <itemizedlist> <listitem> <para>An informal unique name like e.g. <quote>Lufthansa</quote>.</para> </listitem> <listitem> <para>A unique <link xlink:href="http://en.wikipedia.org/wiki/List_of_airline_codes">ICAO abbreviation</link>.</para> </listitem> </itemizedlist> </listitem> <listitem> <para>Destination</para> <itemizedlist> <listitem> <para>Full name like <quote>Frankfurt am Main International</quote></para> </listitem> <listitem> <para>World airport code like <quote>FRA</quote>.</para> </listitem> </itemizedlist> </listitem> <listitem> <para>Flight</para> <itemizedlist> <listitem> <para>A unique flight number e.g. LH 4234</para> </listitem> <listitem> <para>The <quote>owning</quote> airline.</para> </listitem> <listitem> <para>originating airport</para> </listitem> <listitem> <para>destination airport</para> </listitem> <listitem> <para>Constraint: origin and destination must differ. Hint: <productname>Mysql</productname> provides a syntactical means to implement this constraint. It will however not be enforced at runtime. Database vendors like Oracle, IBM/DB2, <productname>Sybase</productname>, <productname>Informix</productname> <abbrev>etc.</abbrev> support this type of runtime integrity constraint enforcement.</para> </listitem> </itemizedlist> </listitem> </itemizedlist> <para>Provide surrogate keys for all entities and provide names for all constraints (<abbrev>e.g.</abbrev> defining <code>CONSTRAINT _PK_XYZ PRIMARY KEY(...)</code> etc. ).</para> </question> <answer> <programlisting language="sql">CREATE Table Airline ( id INT NOT NULL ,name CHAR(20) NOT NULL ,airlineCode CHAR(5) NOT NULL ,CONSTRAINT _PK_Airline_id PRIMARY KEY(id) ,CONSTRAINT _UN_Airline_name UNIQUE(name) ,CONSTRAINT _UN_Airline_airlineCode UNIQUE(airlineCode) ); CREATE TABLE Destination ( id INT NOT NULL ,fullName CHAR(20) NOT NULL ,airportCode CHAR(5) ,CONSTRAINT _PK_Destination_id PRIMARY KEY(id) ,CONSTRAINT _UN_Destination_airportCode UNIQUE(airportCode) ); CREATE TABLE Flight ( id INT NOT NULL ,flightNumber CHAR(10) NOT NULL ,airline INT NOT NULL REFERENCES Airline ,origin int NOT NULL REFERENCES Destination ,destination int NOT NULL REFERENCES Destination -- For yet unknown reasons the following alternative MySQL 5.1 syntax compatible -- statements fail with message 'Cannot add foreign key constraint": -- ,CONSTRAINT _FK_Flight_airline FOREIGN KEY(airline) REFERENCES Airline -- ,CONSTRAINT _FK_Flight_origin FOREIGN KEY(origin) REFERENCES Destination -- ,CONSTRAINT _FK_Flight_destination FOREIGN KEY(destination) REFERENCES Destination ,CONSTRAINT _PK_Flight_id UNIQUE(id) ,CONSTRAINT _UN_Flight_flightNumber UNIQUE(flightNumber) ,CONSTRAINT _CK_Flight_origin_destination CHECK(NOT(origin = destination)) );</programlisting> </answer> </qandaentry> </qandadiv> </qandaset> </section> <section xml:id="toolingConfigJdbc"> <title>Tooling: Configuring and using the <link xlink:href="http://www.eclipse.org/datatools">Eclipse database development</link> plugin</title> <para>For some basic SQL communications the Eclipse environment offers a standard plugin (Database development). Establishing connections to a specific database server generally requires prior installation of a <trademark xlink:href="http://electronics.zibb.com/trademark/jdbc/29545026">JDBC</trademark> driver on the client side as being shown in the following video:</para> <figure xml:id="figureConfigJdbcDriver"> <title>Adding a <trademark xlink:href="http://electronics.zibb.com/trademark/jdbc/29545026">JDBC</trademark> Driver for <productname xlink:href="http://www.mysql.com">Mysql</productname> to the database plugin.</title> <mediaobject> <videoobject> <videodata fileref="Ref/Video/jdbcDriverConfig.mp4"/> </videoobject> </mediaobject> </figure> <para>During the exercises the eclipse database development perspective may be used to browse and structure SQL tables and data. The following video demonstrates the configuration of a <trademark xlink:href="http://electronics.zibb.com/trademark/jdbc/29545026">JDBC</trademark> connection to a local (<varname>localhost</varname> network interface) database server. With respect to the introduction given in <xref linkend="mysql"/> we assume the existence of a database <code>hdm</code> and a corresponding account <quote>hdmuser</quote> and password <quote><code>XYZ</code></quote> (<emphasis role="bold">capital letters!</emphasis>) on our database server.</para> <figure xml:id="figureConfigJdbcConnection"> <title>Configuring a <trademark xlink:href="http://electronics.zibb.com/trademark/jdbc/29545026">JDBC</trademark> connection to a (local) <productname xlink:href="http://www.mysql.com">Mysql</productname> database server.</title> <mediaobject> <videoobject> <videodata fileref="Ref/Video/jdbcConnection.mp4"/> </videoobject> </mediaobject> </figure> <para>We are now ready to communicate with our database server. The last video in this section shows some basic SQL tasks:</para> <figure xml:id="figureEclipseBasicSql"> <title>Executing SQL statements, browsing schema and retrieving data</title> <mediaobject> <videoobject> <videodata fileref="Ref/Video/eclipseBasicSql.mp4"/> </videoobject> </mediaobject> </figure> </section> </chapter> <chapter xml:id="xmlIntro"> <title>Introduction to XML</title> <section xml:id="xmlBasic"> <title>The XML industry standard</title> <para>A short question might be: <quote>What is XML?</quote> An answer might be: The acronym XML stands for <quote>E<emphasis>x</emphasis>tensible <emphasis>M</emphasis>arkup <emphasis>L</emphasis><foreignphrase>anguage</foreignphrase></quote> and is an industry standard being published by the W3C standardization organization. Like other industry software standards talking about XML leads to talk about XML based software: Applications and frameworks supplying added values to software implementors and enhancing data exchange between applications.</para> <para>Many readers are already familiar with XML without explicitly referring to the standard itself: The world wide web's <foreignphrase>lingua franca</foreignphrase> HTML has been ported to an XML dialect forming the <link xlink:href="http://www.w3.org/MarkUp">XHTML</link> Standard. The idea behind this standard is to distinguish between an abstract markup language and rendered results being generated from so called document instances by a browser:</para> <figure xml:id="renderXhtmlMarkup"> <title>Rendering XHTML markup</title> <mediaobject> <imageobject> <imagedata fileref="Ref/Fig/xhtml.fig"/> </imageobject> </mediaobject> </figure> <para>Xhtml is actually a good example to illustrate the tree like, hierarchical structure of XML documents:</para> <figure xml:id="xhtmlTree"> <title>Xhtml tree structure</title> <mediaobject> <imageobject> <imagedata fileref="Ref/Fig/xhtmlexample.fig" scale="65"/> </imageobject> </mediaobject> </figure> <para>We may extend this example by representing a mathematical formula via a standard called <link xlink:href="http://www.w3.org/Math">Mathml</link>:</para> <figure xml:id="mathmlExample"> <title>A formula in <link xlink:href="http://www.w3.org/Math">MathML</link> representation.</title> <mediaobject> <imageobject> <imagedata fileref="Ref/Fig/sqrtrender.fig"/> </imageobject> </mediaobject> </figure> <para>Again we observe a similar situation: A database like <emphasis>representation</emphasis> of a formula on the left and a <emphasis>rendered</emphasis> version on the right. Regarding XML we have:</para> <itemizedlist> <listitem> <para>The <link xlink:href="http://www.w3.org/Math">MathML</link> standard intended to describe mathematical formulas. The standard defines a set of <emphasis>tags</emphasis> like e.g. <tag class="starttag">math:msqrt</tag> with well-defined semantics regarding permitted attribute values and nesting rules.</para> </listitem> <listitem> <para>Informal descriptions of formatting expectations.</para> </listitem> <listitem> <para>Software transforming an XML formula representation into visible or printable output. In other words: A rendering engine.</para> </listitem> </itemizedlist> <para>XML documents may also be regarded as a persistence mechanism to represent and store data. Similarities to Relational Database Systems exist. A RDBMS (<emphasis>R</emphasis><foreignphrase>elational</foreignphrase> <emphasis>D</emphasis><foreignphrase>atabase</foreignphrase> <emphasis>M</emphasis><foreignphrase>anagement</foreignphrase> <emphasis>S</emphasis><foreignphrase>ystem</foreignphrase>) is typically capable to hold Tera bytes of data being organized in tables. The arrangement of data may be subject to various constraints like candidate- or foreign key rules. With respect to both end users and software developers a RDBMS itself is a building block in a complete solution. We need an application on top of it acting as a user interface to the data being contained.</para> <para>In contrast to a RDBMS XML allows data to be organized hierarchically. The <link xlink:href="http://www.w3.org/Math">MathML</link> representation given in <xref linkend="mathmlExample"/> may be graphically visualized:</para> <figure xml:id="mathmltree"> <title>A tree graph representation of the <link xlink:href="http://www.w3.org/Math">MathML</link> example given before.</title> <mediaobject> <imageobject> <imagedata fileref="Ref/Fig/sqrtree.fig"/> </imageobject> </mediaobject> </figure> <para>CAD applications may user XML documents as a representation of graphical primitives:</para> <informalfigure> <mediaobject> <imageobject> <imagedata fileref="Ref/Fig/attributes.fig" scale="65"/> </imageobject> </mediaobject> </informalfigure> <para>Of course RDBMS also allow the representation of tree like structures or arbitrary graphs. But these have to be modelled by using foreign key constraints since relational tables themselves have a <quote>flat</quote> structure. Some RDBMS vendors provide extensions to the SQL standard which allow <quote>native</quote> representations of <link linkend="gloss_XML"><abbrev>XML</abbrev></link> documents.</para> </section> <section xml:id="xmlHtml"> <title>Well formed XML documents</title> <para>The general structure of an <link linkend="gloss_XML"><abbrev>XML</abbrev></link> document is as follows:</para> <figure xml:id="xmlbase"> <title><link linkend="gloss_XML"><abbrev>XML</abbrev></link> basic structure</title> <mediaobject> <imageobject> <imagedata fileref="Ref/Fig/xmlbase.fig" scale="65"/> </imageobject> </mediaobject> </figure> <para>We explore a simple XML document representing messages like E-mails:</para> <figure xml:id="memoWellFormed"> <title>The representation of a short message.</title> <programlisting language="none"><?xml<co xml:id="first_xml_code_magic"/> version="1.0"<co xml:id="first_xml_code_version"/> encoding="UTF-8"<co xml:id="first_xml_code_encoding"/>?> <memo><co xml:id="first_xml_code_topelement"/> <from>M. Goik</from><co xml:id="first_xml_code_from"/> <to>B. King</to> <to>A. June</to> <subject>Best whishes</subject> <content>Hi all, congratulations to your splendid party</content> </memo></programlisting> </figure> <calloutlist> <callout arearefs="first_xml_code_magic"> <para>The very first characters <code><?xml</code> may be regarded as a <link xlink:href="http://en.wikipedia.org/wiki/Magic_number_(programming)">magic number string</link> being used as a format indicator which allows to distinguish between different file types i.e. GIF, JPEG, HTML and so on.</para> </callout> <callout arearefs="first_xml_code_version"> <para>The <code>version="1.0"</code> attribute tells us that all subsequent lines will conform to the <link xlink:href="http://www.w3.org/TR/xml">XML</link> standard of version 1.0. This way a document can express its conformance to the version 1.0 standard even if in the future this standard evolves to a higher version e.g. <code>version="2.1"</code>.</para> </callout> <callout arearefs="first_xml_code_encoding"> <para>The attribute <code>encoding="UTF-8"</code> tells us that all text in the current document uses <link xlink:href="http://unicode.org">Unicode</link> encoding. <link xlink:href="http://unicode.org">Unicode</link> is a widely accepted industry standard for font encoding. Thus European, Cyrillic and most Asian font codes are allowed to be used in documents <emphasis>simultaneously</emphasis>. Other encodings may limit the set of allowed characters, e.g. <code>encoding="ISO-8859-1"</code> will only allow characters belonging to western European languages. However a system also needs to have the corresponding fonts (e.g. TrueType) being installed in order to render the document appropriately. A document containing Chinese characters is of no use if the underlying rendering system lacks e.g. a set of Chinese True Type fonts.</para> </callout> <callout arearefs="first_xml_code_topelement"> <para>An XML document has exactly one top level <emphasis>node</emphasis>. In contrast to the HTML standard these nodes are commonly called elements rather than tags. In this example the top level (root) element is <tag class="starttag">memo</tag>.</para> </callout> <callout arearefs="first_xml_code_from"> <para>Each XML element like <tag class="starttag">from</tag> has a corresponding counterpart <tag class="endtag">from</tag>. In terms of XML we say each element being opened has to be closed. In conjunction with the precedent point this is equivalent to the fact that each XML document represents a tree structure as being shown in the <link linkend="mathmltree">tree graph</link> representation.</para> </callout> </calloutlist> <para>As with the introductory formula example this representation itself is of limited usefulness: In an office environment we need a rendered version being given either as print or as some online format like E-Mail or HTML.</para> <para>From a software developer's point of view we may use a piece of software called a <emphasis>parser</emphasis> to test the document's standard conformance. At the MI department we may simply invoke <userinput><command>xmlparse</command> message.xml</userinput> to start a check:</para> <programlisting language="none"><errortext>goik>xmlparse wellformed.xml Parsing was successful</errortext></programlisting> <para>Various XML related plugins are supplied for the <productname xlink:href="http://eclipse.org">eclipse platform</productname> like the <productname xlink:href="http://oxygenxml.com">Oxygen software</productname> supplying <quote>life</quote> conformance checking while editing XML documents. Now we test our assumptions by violating some of the rules stated before. We deliberately omit the closing element <tag class="endtag">from</tag>:</para> <figure xml:id="omitFrom"> <title>An invalid XML document due to the omission of <tag class="endtag">from</tag>.</title> <programlisting language="none"><?xml version="1.0" encoding="UTF-8"?> <memo> <from>M. Goik <co xml:id="omitFromMissingElement"/> <to>B. King</to> <to>A. June</to> <subject>Best whishes</subject> <content>Hi all, congratulations to your splendid party</content> </memo></programlisting> <calloutlist> <callout arearefs="omitFromMissingElement"> <para>The opening element <tag class="starttag">from</tag> is not terminated by <tag class="endtag">from</tag>.</para> </callout> </calloutlist> </figure> <para>Consequently the parser's output reads:</para> <programlisting language="none"><errortext>goik>xmlparse omitfrom.xml file:///ma/goik/workspace/Vorlesungen/Input/Memo/omitfrom.xml:8:3: fatal error org.xml.sax.SAXParseException: The element type "from" must be terminated by the matching end-tag "</from>". parsing error</errortext></programlisting> <para>Experienced HTML authors may be confused: In fact HTML is not an XML standard. Instead HTML belongs to the set of SGML applications. SGML is a much older standard namely the <emphasis>Standard Generalized Markup Language</emphasis>.</para> <para>Even if every XML element has a closing counterpart the resulting XML may be invalid:</para> <programlisting language="none"><?xml version="1.0" encoding="UTF-8"?> <memo> <from>M. Goik<to>B. King</from></to> <to>A. June</to> <subject>Best whishes</subject> <content>Hi all, congratulations to your splendid party</content> </memo></programlisting> <para>The parser echoes:</para> <programlisting language="none"><computeroutput>file:///ma/goik/workspace/Vorlesungen/Input/Memo/nonest.xml:3:29: fatal error org.xml.sax.SAXParseException: The element type "to" must be terminated by the matching end-tag "</to>". parsing error</computeroutput></programlisting> <para>This type of error is caused by so called improper nesting of elements: The element <tag class="starttag">from</tag>is closed before the <quote>inner</quote> element <tag class="starttag">to</tag> has been closed. Actually this violates the expressibility of XML documents as a tree like structure. The situation may be resolved by choosing:</para> <programlisting language="none">...<from>M. Goik<to>B. King</to></from>...</programlisting> <para>We provide two examples illustrating proper and improper nesting of XML documents:</para> <figure xml:id="fig_nestingProper"> <title>Proper nesting of XML elements</title> <mediaobject> <imageobject> <imagedata fileref="Ref/Fig/propernest.fig" scale="65"/> </imageobject> </mediaobject> </figure> <para>The following example violates proper nesting constraint and thus does not provide an XML document:</para> <figure xml:id="fig_improperNest"> <title>Improperly nested elements</title> <mediaobject> <imageobject> <imagedata fileref="Ref/Fig/impropernest.fig"/> </imageobject> </mediaobject> </figure> <!-- goik:later <para>An animation showing the usage of the Oxygen plug in for the examples given above can be found <uri xlink:href="src/viewlet/wellformed/wellformed_viewlet_swf.html">here</uri>.</para> --> <para>XML elements may have so called attributes like <tag class="attribute">date</tag> in the following example:</para> <figure xml:id="memoWellAttrib"> <title>An XML document with attributes.</title> <programlisting language="none"><?xml version="1.0" encoding="UTF-8"?> <memo date="10.02.2006" priority="high"> <from>M. Goik</from> <to>B. King</to> <to>A. June</to> <subject>Best whishes</subject> <content>Hi all, congratulations to your splendid party</content> </memo></programlisting> </figure> <para>The conformance of a XML document with the following rules may be verified by invoking a parser:</para> <itemizedlist> <listitem> <para>Within the <emphasis>scope</emphasis> of a given element an attribute name must be unique. In the example above one may not define a second attribute <varname>date="..."</varname> within the same element <memo ... >. This reflects the usual programming language semantics of attributes: In a <link linkend="gloss_Java"><trademark>Java</trademark></link> class an attribute is represented by an unique identifier and thus cannot appear twice.</para> </listitem> <listitem> <para>An attribute value must be enclosed either in single (') or double (") quotes. This is different from the HTML standard which allows attribute values without quotes provided the given attribute value does not give rise to ambiguities. For example <tag class="starttag">td align=left</tag> is allowed since the attribute value <tag class="attvalue">left</tag> does not contain any spaces thus allowing a parser to recognize the end of the value's definition.</para> </listitem> </itemizedlist> <qandaset defaultlabel="qanda" xml:id="example_memoAttribTree"> <title>A graphical representation of a memo.</title> <qandadiv> <qandaentry> <question> <para>Draw a graphical representation similar as in <xref linkend="mathmltree"/> of the memo document being given in <xref linkend="memoWellAttrib"/>.</para> </question> <answer> <para>The <link linkend="memoWellAttrib">memo document's</link> structure may be visualized as:</para> <informalfigure xml:id="memotreeFigure"> <para>A graphical representation of <xref linkend="memoWellAttrib"/>:</para> <informalfigure xml:id="memotreeFigureFalse"> <mediaobject> <imageobject> <imagedata fileref="Ref/Fig/memotree.fig"/> </imageobject> </mediaobject> </informalfigure> <para>The sequence of <emphasis>element</emphasis> child nodes is important in XML and has to be preserved. Only the order of the two attributes <tag class="attribute">date</tag> and <tag class="attribute">priority</tag> is undefined: They actually belong to the <tag class="starttag">memo</tag> node serving as a dictionary with the attribute names being the keys and the attribute values being the values of the dictionary.</para> </informalfigure> </answer> </qandaentry> <qandaentry xml:id="example_attribInQuotes"> <question> <label>Attributes and quotes</label> <para>As stated before XML attributes have to be enclosed in single or double quotes. Construct an XML document with mixed quotes like <code><date day="monday'></code>. How does the parser react? Find the corresponding syntax definition of legal attribute values in the <link xlink:href="http://www.w3.org/TR/xml">XML standard W3C Recommendation</link>.</para> </question> <answer> <para>The parser flags a mixture of single and double quotes for a given attribute as an error. The XML standard <link xlink:href="http://www.w3.org/TR/xml#NT-AttValue">defines</link> the syntax of attribute values: An attribute value has to be enclosed <emphasis>either</emphasis> in two single <emphasis>or</emphasis> in two double quotes as being defined in <uri xlink:href="http://www.w3.org/TR/xml/#NT-AttValue">http://www.w3.org/TR/xml/#NT-AttValue</uri>.</para> </answer> </qandaentry> <qandaentry xml:id="quoteInAttributValue"> <question> <label>Quotes as part of an attributes value?</label> <para>Single and double quote are used to delimit an attribute value. May quotes appear themselves as part of an at tribute's value, e.g. like in a person's name <code>Gary "King" Mandelson</code>?</para> </question> <answer> <para>Attribute values may contain double quotes if the attributes value is enclosed in single quotes and vice versa. As a limitation the value of an an attribute may not contain single quotes and double quotes at the same time:</para> <informalfigure xml:id="exampleSingleDoubleQuotes"> <para>Quotes as part of attribute values.</para> <programlisting language="none"><?xml version="1.0" encoding="UTF-8"?> <test> <person name='Gary "King" Mandelson'/> <!-- o.k. --> <person name="Gary 'King' Mandelson"/> <!-- o.k. --> <person name="Gary 'King 'S.' "Mandelson"'/> <!-- oops! --> </test></programlisting> </informalfigure> </answer> </qandaentry> </qandadiv> </qandaset> <para>Some constraints being imposed on XML documents by the standard defined so far may be summarized as:</para> <itemizedlist> <listitem> <para>A XML documents requires to have exactly one top level element.</para> </listitem> <listitem> <para>Elements have to be properly nested. An element must not be closed if an <quote>inner</quote> Element is still open.</para> </listitem> <listitem> <para>Attribute names within a given Element must be unique.</para> </listitem> <listitem> <para>Attribute values <emphasis>must</emphasis> be quoted correctly.</para> </listitem> </itemizedlist> <para>The very last rule shows one of several differences to the HTML Standard: In HTML a lot of elements don't have to be closed. For example paragraphs (<tag class="starttag">p</tag>) or images (<tag class="starttag">img src='foo.gif'</tag>) don't have to be closed explicitly. This is due to the fact that HTML used to be defined in accordance with the older <emphasis><emphasis role="bold">S</emphasis>tandard <emphasis role="bold">G</emphasis>eneralized <emphasis role="bold">M</emphasis>arkup <emphasis role="bold">L</emphasis>anguage</emphasis> (SGML) Standard.</para> <para>These constraints are part of the definition of a <link xlink:href="http://www.w3.org/TR/xml#sec-well-formed">well formed document</link>. The specification imposes additional constraints for a document to be well-formed.</para> </section> </chapter> <chapter xml:id="dtd"> <title>Beyond well- formedness</title> <section xml:id="motivationDdt"> <title>Motivation</title> <para>So far we are able to create XML documents containing hierarchically structured data. We may nest elements and thus create tree structures of arbitrary depth. The only restrictions being imposed by the XML standard are the constraints of well - formedness. For many purposes in software development this is not sufficient.</para> <para>A company named <productname>Softmail</productname> might implement an email system which uses <link linkend="memoWellAttrib">memo</link> document files as low level data representation serving as a persistence layer. Now a second company named <productname>Hardmail</productname> wants to integrate mails generated by <productname>Softmail</productname>'s system into its own business product. The <productname>Hardmail</productname> software developers might <emphasis>infer</emphasis> the logical structure of <productname>Softmail</productname>'s email representation but the following problems arise:</para> <itemizedlist> <listitem> <para>The logical structure will in practice become more complex: E-mails may contain attachments leading to multi part messages. Additional header information is required for standard Internet mail compliance. This adds additional complexity to the XML structure being mandatory for data representation. Relying only on well-formedness the specification of an internal E-mail format can only be achieved <emphasis>informally</emphasis>. Thus a rule like <quote>Each E-mail must have a subject</quote> may be written down in the specification. A software developer will code these rules but probably make mistakes as the set of rules grows.</para> <para>In contrast a RDBMS based solution offers to solve such problems in a declarative manner: A developer may use a <code>NOT NULL</code> constraint on a subject attribute of type <code>VARCHAR</code> thus inhibiting empty subjects.</para> </listitem> <listitem> <para>As <productname>Softmail</productname>'s product evolves its internal E-mail XML format is subject to change due to functional extensions and possibly bug fixes both giving rise to interoperability problems.</para> </listitem> </itemizedlist> <para>Generally speaking well formed XML documents lack grammar constraints as being available for programming languages. In case of RDBMS developers can impose primary-, foreign and <code>CHECK</code> constraints in a <emphasis>declarative</emphasis> manner rather than hard coding them into their applications (A solution bad programmers are in favour of though...). Various XML standards exist for declarative constraint definitions namely:</para> <itemizedlist> <listitem> <para>DTDs</para> </listitem> <listitem> <para><link xlink:href="http://www.w3.org/XML/Schema">XML Schema</link></para> </listitem> <listitem> <para><link xlink:href="http://www.relaxng.org">RelaxNG</link></para> </listitem> </itemizedlist> </section> <section xml:id="dtdBasic"> <title>XML Schema</title> <section xml:id="dtdFirstExample"> <title>Structural descriptions for documents</title> <para>As an example we choose documents of type <emphasis>memo</emphasis> as a starting point. Documents like the example from <xref linkend="memoWellAttrib"/> may be <emphasis>informally</emphasis> described to be a sequence of the following mandatory items:</para> <figure xml:id="figure_memo_informalconstraints"> <title>Informal constraints on <tag class="element">memo</tag> document instances</title> <itemizedlist> <listitem> <para><emphasis>Exactly one</emphasis> sender.</para> </listitem> <listitem> <para><emphasis>One or more</emphasis> recipients.</para> </listitem> <listitem> <para>Subject</para> </listitem> <listitem> <para>Content</para> </listitem> </itemizedlist> <para>In addition we have:</para> <itemizedlist> <listitem> <para>A date string <emphasis>must</emphasis> be supplied</para> </listitem> <listitem> <para>A priority <emphasis>may</emphasis> be supplied with allowed values to be chosen from the set of values <tag class="attvalue">low</tag>, <tag class="attvalue">medium</tag> or <tag class="attvalue">high</tag>.</para> </listitem> </itemizedlist> </figure> <para>All these fields contain ordinary text to be filled in by a user and shall appear exactly in the defined order. For simplicity we do not care about email address syntax rules being described in <link xlink:href="http://www.w3.org/Protocols/rfc822">RFC based address schemes</link>. We will see how the <emphasis>constraints</emphasis> mentioned above can be modelled in XML by an extension to the concept of well formed documents.</para> </section> <section xml:id="section_memo_machinereadable"> <title>A machine readable description</title> <para>We now introduce an example of an XML schema. It allows for the specification of additional constraints to both element nodes and their attributes. Our set of <link linkend="figure_memo_informalconstraints" revision="">informal constraints</link> on memo documents may be expressed as:</para> <figure xml:id="figure_memo_dtd"> <title>A schema to describe memo documents.</title> <programlisting language="none"><xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns:vc="http://www.w3.org/2007/XMLSchema-versioning" elementFormDefault="qualified" vc:minVersion="1.0" vc:maxVersion="1.1"> <xs:element name="memo"> <xs:complexType> <xs:sequence> <co xml:id="memodtd_memodef"/> <xs:element name="from" type="xs:string"/> <co xml:id="memodtd_elem_from"/> <xs:element name="to" minOccurs="1" maxOccurs="unbounded" type="xs:string"/> <xs:element name="subject" type="xs:string"/> <xs:element name="content" type="xs:string"/> </xs:sequence> <xs:attribute name="date" type="xs:date" use="required"/> <co xml:id="memodtd_memo_attribs"/> <xs:attribute name="priority" type="Priority" use="optional"/> </xs:complexType> </xs:element> <xs:simpleType name="Priority"> <xs:restriction base="xs:string"> <xs:enumeration value="low"/> <xs:enumeration value="medium"/> <xs:enumeration value="high"/> </xs:restriction> </xs:simpleType> </xs:schema></programlisting> <calloutlist> <callout arearefs="memodtd_memodef"> <para>A <tag class="element">memo</tag> consists of a sender, at least one recipient, a subject and content.</para> </callout> <callout arearefs="memodtd_memo_attribs"> <para>A <tag class="element">memo</tag> has got one required attribute <varname>date</varname> and an optional attribute <varname>priority</varname> being restricted to the three allowed values <tag class="attvalue">low</tag>, <tag class="attvalue">medium</tag> and <tag class="attvalue">high</tag> being defined by a separate <tag class="starttag">xs:simpleType</tag> directive.</para> </callout> <callout arearefs="memodtd_elem_from"> <para>A <tag class="starttag">from</tag> element consists of ordinary text. This disallows XML markup. For example <code><from>Smith & partner</from></code> is disallowed since XML uses the ampersand (&) to denote the beginning of an entity like <tag class="genentity">auml</tag> for the German a-umlaut (ä). The correct form is <code><from>Smith &amp; partner</from></code> using the predefined entity <tag class="genentity">amp</tag> as an escape sequence for the ampersand.</para> <para><code>type="xs:string"</code> is a built in XML Schema type representing a restricted version of ordinary strings. Without digging into details a <code>xs:string</code> string must not contain any markup code like e.g. <tag class="starttag">msqrt</tag>. This ensures that a string does not interfere with the document's XML markup.</para> </callout> </calloutlist> </figure> <para>We notice our schema's syntax itself is an XML document.</para> <para>From the viewpoint of software modeling an XML Schema instance is a <emphasis>schema</emphasis> describing the syntax of a class of XML document instances adhering to it. In the context of XML technologies <link xlink:href="http://www.w3.org/XML/Schema">XML Schema</link> is one of several language alternatives which allow for XML document structure descriptions.</para> <para>Readers being familiar with <abbrev xlink:href="http://en.wikipedia.org/wiki/Backus-Naur_form">BNF</abbrev> or <abbrev xlink:href="http://en.wikipedia.org/wiki/Extended_Backus_Naur_form">EBNF</abbrev> will be able to understand the grammatical rules being expressed here.</para> <productionset> <title>A message of type <tag class="starttag">memo</tag></title> <production xml:id="memo.ebnf.memo"> <lhs>Memo Message</lhs> <rhs>'<memo>' <nonterminal def="#memo.ebnf.sender">Sender</nonterminal> [<nonterminal def="#memo.ebnf.recipient">Recipient</nonterminal>]+ <nonterminal def="#memo.ebnf.subject">Subject</nonterminal> <nonterminal def="#memo.ebnf.content">Content</nonterminal> '</memo>'</rhs> </production> <production xml:id="memo.ebnf.sender"> <lhs>Sender</lhs> <rhs>'<from>' <nonterminal def="#memo.ebnf.text"> Text </nonterminal> '</from>'</rhs> </production> <production xml:id="memo.ebnf.recipient"> <lhs>Recipient</lhs> <rhs>'<to>' <nonterminal def="#memo.ebnf.text"> Text </nonterminal> '</to>'</rhs> </production> <production xml:id="memo.ebnf.subject"> <lhs>Subject</lhs> <rhs>'<subject>' <nonterminal def="#memo.ebnf.text"> Text </nonterminal> '</subject>'</rhs> </production> <production xml:id="memo.ebnf.content"> <lhs>Content</lhs> <rhs>'<content>' <nonterminal def="#memo.ebnf.text"> Text </nonterminal> '</content>'</rhs> </production> <production xml:id="memo.ebnf.text"> <lhs>Text</lhs> <rhs>[a-zA-Z0-9]* <lineannotation>In real documents this is too restrictive!</lineannotation></rhs> </production> </productionset> <para>We may as well supply a graphical representation:</para> <figure xml:id="extendContModelGraph"> <title>Graphical representation of the extended <code>content</code> model.</title> <mediaobject> <imageobject> <imagedata fileref="Ref/Fig/contentmixed.fig"/> </imageobject> </mediaobject> </figure> <para>In comparison to our informal description of memo documents a schema offers an added value: The grammar is machine readable and may thus become input to a parser which in turn gets enabled to check whether an XML document obeys the constraints being imposed. So the parser must be instructed to use a schema in addition to the XML document in question. For this purpose an XML document may define a reference to a schema:</para> <figure xml:id="memo_external_dtd"> <title>A memo document instance holding a reference to a document external schema.</title> <programlisting language="none"><memo <co xml:id="memo_external_dtd_top_element"/> xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:noNamespaceSchemaLocation="memo.xsd" <co xml:id="memo_external_dtd_url"/> date="2014-09-24" priority="high"> <from>M. Goik</from> <to>B. King</to> <to>A. June</to> <subject>Best whishes</subject> <content>Hi all, congratulations to your splendid party</content> </memo></programlisting> <calloutlist> <callout arearefs="memo_external_dtd_top_element"> <para>The element <tag class="starttag">memo</tag> is chosen to be the top (root) element of the document's tree. It must be defined in our schema <filename>memo.xsd</filename>. This is really a choice since an XML schema defines a <emphasis>set</emphasis> of elements in <emphasis>arbitrary</emphasis> order. There is no such rule as <quote>define before use</quote>. So an XML schema does not tell us which element has to appear on top of a document.</para> <para>Suppose a given XML schema offers both <tag class="starttag">book</tag> and <tag class="starttag">report</tag> elements. An XML author writing a complex document will choose <tag class="starttag">book</tag> as top level element rather than <tag class="starttag">report</tag> being more appropriate for a small piece of documentation. Consequently it is an XML authors <emphasis>choice</emphasis> which of the elements being defined in a schema shall appear as <emphasis>the</emphasis> top level element</para> </callout> <callout arearefs="memo_external_dtd_url"> <para>The address of the schema's rule set. In the given example it is just a filename but it may as well be an <link xlink:href="http://www.w3.org/Addressing">URL</link> of type <abbrev xlink:href="http://en.wikipedia.org/wiki/File_Transfer_Protocol">ftp</abbrev>, <abbrev xlink:href="http://www.w3.org/Protocols">http</abbrev> and so on, see <xref linkend="memoDtdOnFtp"/>.</para> </callout> </calloutlist> </figure> <para>In presence of a schema parsing a document is actually a two step process: First the parser will check the document for well -formedness. Then the parser will read the referenced schema <filename>memo.xsd</filename> and check the document for the additional constraints being defined within.</para> <para>In the current example both the schema and the XML memo document reside as text files in a common file system folder. For general use a schema is usually kept at a centralized location. The attribute <varname>xsi:noNamespaceSchemaLocation</varname> value is actually a <emphasis>U</emphasis><foreignphrase>niform</foreignphrase> <emphasis>R</emphasis><foreignphrase>esource</foreignphrase> <emphasis>L</emphasis><foreignphrase>ocator</foreignphrase> <link xlink:href="http://www.w3.org/Addressing">(URL)</link>. Thus our <filename>memo.xsd</filename> may also be supplied as a <abbrev xlink:href="http://www.w3.org/Protocols">http</abbrev> or <abbrev xlink:href="http://en.wikipedia.org/wiki/File_Transfer_Protocol">ftp</abbrev> <link xlink:href="http://www.w3.org/Addressing">URL</link>:</para> <figure xml:id="memoDtdOnFtp"> <title>A schema reference to a FTP server.</title> <programlisting language="none"><?xml version="1.0" encoding="UTF-8"?> <memo ... xsi:noNamespaceSchemaLocation="https://someserver.org/memo.xsd"> <from>M. Goik</from> ... </memo></programlisting> </figure> <para>Some terms are helpful in the context of schemas:</para> <variablelist> <varlistentry> <term>Validating / non-validating:</term> <listitem> <para>A non-validating parser only checks a document for well- formedness. If it also checks XML documents for conformance to schema it is a <emphasis>validating</emphasis> parser.</para> </listitem> </varlistentry> <varlistentry> <term>Valid / invalid documents:</term> <listitem> <para>An XML document referencing a schema may either be valid or invalid depending on its conformance to the schema in question.</para> </listitem> </varlistentry> <varlistentry> <term>Document instance:</term> <listitem> <para>An XML memo document may conform to the <link linkend="figure_memo_dtd">memo schema</link>. In this case we call it a <emphasis>document instance</emphasis> of the memo schema.</para> <para>This situation is quite similar as in typed programming languages: A <link linkend="gloss_Java"><trademark>Java</trademark></link> <code>class</code> declaration is a blueprint for the <link linkend="gloss_Java"><trademark>Java</trademark></link> runtime system to construct <link linkend="gloss_Java"><trademark>Java</trademark></link> objects in memory. This is done by e.g. a statement<code> String name = new String();</code>. The identifier <code>name</code> will hold a reference to an <emphasis>instance of class String</emphasis>. So in a <link linkend="gloss_Java"><trademark>Java</trademark></link> runtime environment a class declaration plays the same role as a schema declaration in XML. See also <xref linkend="example_memoJavaClass"/>.</para> </listitem> </varlistentry> </variablelist> <para>For further discussions it is very useful to clearly distinguish element definitions in a schema from their <emphasis>realizations</emphasis> in a corresponding document instance: Our memo schema defines an element <tag class="starttag">from</tag> to be of content <type>xs:string</type>. According to the schema at least one <tag class="starttag">from</tag> clause must appear in a valid (conforming) document instance . If we were talking about HTML document instances we would prefer to talk about a <tag class="starttag">from</tag> <emphasis>tag</emphasis> rather than a <tag class="starttag">from</tag> <emphasis>element</emphasis>.</para> <para>In this document we will use the term <emphasis>element type</emphasis> to denote an <code><xs:element ...</code> definition in a schema. Thus we will talk about an element type <tag class="element">subject</tag> being defined in <filename>memo.xsd</filename>.</para> <para>An element type being defined in a <abbrev xlink:href="http://en.wikipedia.org/wiki/Document_Type_Declaration">schema</abbrev> may have document instances as realizations. For example the document instance shown in <xref linkend="memo_external_dtd"/> has two <emphasis>nodes</emphasis> of element type <tag class="element">to</tag>. Thus we say that the document instance contains two <emphasis>element nodes</emphasis> of type <tag class="element">to</tag>. We will frequently abbreviate this by saying the instance contains to <tag class="starttag">from</tag> element nodes. And we may even omit the term <emphasis>nodes</emphasis> and simply talk about two <tag class="starttag">from</tag> elements. But the careful reader should always distinguish between a single type <code>foo</code> being defined in a <abbrev xlink:href="http://en.wikipedia.org/wiki/Document_Type_Declaration">schema</abbrev> and the possibly empty set of <tag class="starttag">foo</tag> nodes appearing in valid document instances.</para> <para><abbrev xlink:href="http://en.wikipedia.org/wiki/Document_Type_Declaration">Schema</abbrev>'s appear on top of well-formed XML documents:</para> <figure xml:id="wellformedandvalid"> <title>Well-formed and valid documents</title> <mediaobject> <imageobject> <imagedata fileref="Ref/Fig/wellformedandvalid.fig" scale="65"/> </imageobject> </mediaobject> </figure> <qandaset defaultlabel="qanda" xml:id="example_memoTestValid"> <title>Validation of memo document instances.</title> <qandadiv> <qandaentry> <question> <para>Copy the two files <link xlink:href="Ref/src/Memo.1/message.xml">message.xml</link> and <link xlink:href="Ref/src/Memo.1/memo.xsd">memo.xsd</link> into your eclipse project. Use the Oxygen XML plug in to check if the document is valid. Then subsequently do and undo the following changes each time checking the document for validity:</para> <itemizedlist> <listitem> <para>Omit the <tag class="starttag">from</tag> element.</para> </listitem> <listitem> <para>Change the order of the two sub elements <tag class="starttag">subject</tag> and <tag class="starttag">content</tag>.</para> </listitem> <listitem> <para>Erase the <varname>date</varname> attribute and its value.</para> </listitem> <listitem> <para>Erase the <varname>priority</varname> attribute and its value.</para> </listitem> </itemizedlist> <para>What do you observe?</para> </question> <answer> <para>The <tag class="attribute">priority</tag> attribute is declared as <code>optional</code> and may thus be omitted. Erasing the <tag class="attribute">priority</tag> attribute thus leaves the document in a valid state. The remaining three edit actions yield an invalid document instance.</para> </answer> </qandaentry> <qandaentry xml:id="example_memoJavaClass"> <question> <label>A memo implementation sketch in Java</label> <para>The aim of this exercise is to clarify the (abstract) relation between XML <abbrev xlink:href="http://en.wikipedia.org/wiki/Document_Type_Declaration">schema</abbrev>'s and sets of <link linkend="gloss_Java"><trademark>Java</trademark></link> classes rather then building a running application. We want to model the <link xlink:href="Ref/src/Memo.1/memo.xsd">memo schema</link> as a set of <link linkend="gloss_Java"><trademark>Java</trademark></link> classes.</para> </question> <answer> <para>The XML attributes <tag class="attribute">date</tag> and <tag class="attribute">priority</tag> can be mapped as <link linkend="gloss_Java"><trademark>Java</trademark></link> attributes. The same applies for the Memo elements <tag class="element">from</tag>, <tag class="element">subject</tag> and <tag class="element">content</tag> which may be implemented as simple Strings or alternatively as separate Classes wrapping the String content. The latter method of implementation should be preferred if the Memo schema is expected to grow in complexity. A simple sketch reads:</para> <programlisting language="none">import java.util.Date; import java.util.SortedSet; public class Memo { private Date date; Priority priority = Priority.standard; private String from, subject,content; private SortedSet<String> to; // Accessors not yet implemented }</programlisting> <para>The only thing to note here is the implementation of the <tag class="element">to</tag> element: We want to be able to address a <emphasis>set</emphasis> of recipients. Thus we have to disallow duplicates. Note that this is an <emphasis>informal</emphasis> constraint not being handled by our schema: A Memo document instance <emphasis>may</emphasis> have duplicate content in <tag class="starttag">to</tag> nodes. This is a weakness of <abbrev xlink:href="http://en.wikipedia.org/wiki/Document_Type_Declaration">schema</abbrev>s: We are unable to impose uniqueness constraints on the content of partial sets of document nodes.</para> <para>On the other hand our set of recipients has to be ordered: In a XML document instance the order of <tag class="starttag">to</tag> nodes is important and has to be preserved in a <link linkend="gloss_Java"><trademark>Java</trademark></link> representation. Thus we choose an <classname>java.util.SortedSet</classname> parametrized with String type to fulfill both requirements.</para> <para>Our schema defines:</para> <programlisting language="none"><!ATTLIST memo ... priority (low|medium|high) #IMPLIED></programlisting> <para>Starting from <link linkend="gloss_Java"><trademark>Java</trademark></link> 1.5 we may implement this constraint by a type safe enumeration in a file <filename>Priority.java</filename>:</para> <programlisting language="none">public enum Priority{low, standard, high};</programlisting> </answer> </qandaentry> </qandadiv> </qandaset> <para>In the following chapters we will extend the memo document type (<code><!DOCTYPE memo ... ></code>) to demonstrate various concepts of <abbrev xlink:href="http://en.wikipedia.org/wiki/Document_Type_Declaration">schema</abbrev>'s and other XML related standards. In parallel a series of exercises deals with building a schema usable to edit books. This schema gets extended as our knowledge about XML advances. We start with an initial exercise:</para> <qandaset defaultlabel="qanda" xml:id="example_bookDtd"> <title>A schema for editing books</title> <qandadiv> <qandaentry> <question> <para>Write a schema describing book document instances with the following features:</para> <itemizedlist> <listitem> <para>A book shall have a title to describe the book itself.</para> </listitem> <listitem> <para>A book shall have at least one but possibly a sequence of chapters.</para> </listitem> <listitem> <para>Each chapter shall have a title and at least one paragraph.</para> </listitem> <listitem> <para>The titles and paragraphs shall consist of ordinary text.</para> </listitem> </itemizedlist> </question> <answer> <para>A possible schema looks like:</para> <figure xml:id="figure_book.dtd_v1"> <title>A first schema version for book documents</title> <programlisting language="none"><xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns:vc="http://www.w3.org/2007/XMLSchema-versioning" elementFormDefault="qualified" vc:minVersion="1.0" vc:maxVersion="1.1"> <xs:element name="book"> <xs:complexType> <xs:sequence> <xs:element ref="title"/> <xs:element ref="chapter" minOccurs="1" maxOccurs="unbounded"/> </xs:sequence> </xs:complexType> </xs:element> <xs:element name="title" type="xs:string"/> <xs:element name="chapter"> <xs:complexType> <xs:sequence> <xs:element ref="title"/> <xs:element ref="para" minOccurs="1" maxOccurs="unbounded"/> </xs:sequence> </xs:complexType> </xs:element> <xs:element name="para" type="xs:string"/> </xs:schema></programlisting> </figure> <para>We supply a valid document instance:</para> <informalfigure xml:id="bookInitialInstance"> <programlisting language="none"><?xml version="1.0" encoding="UTF-8"?> <book xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:noNamespaceSchemaLocation="book.xsd"> <title>Introduction to Java</title> <chapter> <title>Introduction</title> <para>Java is a programming language</para> </chapter> <chapter> <title>The virtual machine</title> <para>We also call it the runtime system.</para> </chapter> <chapter> <title>Annotations</title> <para>Annotations provide a means to add meta information.</para> <para>This is especially useful for framework authors.</para> </chapter> </book></programlisting> </informalfigure> <para>.</para> </answer> </qandaentry> </qandadiv> </qandaset> </section> <section xml:id="dtdVsSqlDdl"> <title>Relating <abbrev xlink:href="http://en.wikipedia.org/wiki/Document_Type_Declaration">schema</abbrev>'s and <acronym xlink:href="http://en.wikipedia.org/wiki/Sql">SQL</acronym> - <abbrev xlink:href="http://en.wikipedia.org/wiki/Data_definition_language">DDL</abbrev></title> <para>XML <abbrev xlink:href="http://en.wikipedia.org/wiki/Document_Type_Declaration">schema</abbrev>'s and <acronym xlink:href="http://en.wikipedia.org/wiki/Sql">SQL</acronym> - <abbrev xlink:href="http://en.wikipedia.org/wiki/Data_definition_language">DDL</abbrev> are related: They both describe data models and thus integrity constraints. We consider a simple invoice example:</para> <figure xml:id="invoiceIntegrity"> <title>Invoice integrity constraints</title> <mediaobject> <imageobject> <imagedata fileref="Ref/Fig/invoicedata.fig" scale="65"/> </imageobject> </mediaobject> </figure> <para>A relational implementation may look like:</para> <figure xml:id="invoiceSqlDdl"> <title>Relational implementation</title> <mediaobject> <imageobject> <imagedata fileref="Ref/Fig/invoicedataimplement.fig" scale="65"/> </imageobject> </mediaobject> </figure> <qandaset defaultlabel="qanda" xml:id="qandaInvoiceSchema"> <title>An XML schema representing invoices</title> <qandadiv> <qandaentry> <question> <para>Represent the relational schema being described in <xref linkend="invoiceSqlDdl"/> by an XML Schema and provide an appropriate instance example.</para> </question> <answer> <para>A possible schema implementation:</para> <programlisting language="none"><xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns:vc="http://www.w3.org/2007/XMLSchema-versioning" elementFormDefault="qualified" vc:minVersion="1.0" vc:maxVersion="1.1"> <xs:simpleType name="money"> <xs:restriction base="xs:decimal"> <xs:fractionDigits value="2"/> </xs:restriction> </xs:simpleType> <xs:element name="data"> <xs:complexType> <xs:sequence> <xs:element ref="customer" maxOccurs="unbounded"/> <xs:element ref="invoice" maxOccurs="unbounded"/> </xs:sequence> </xs:complexType> <xs:key name="customerId"> <xs:selector xpath="customer"/> <xs:field xpath="@id"/> </xs:key> <xs:keyref refer="customerId" name="customerToInvoice"> <xs:selector xpath="invoice"/> <xs:field xpath="@customer"></xs:field> </xs:keyref> </xs:element> <xs:element name="customer"> <xs:complexType> <xs:sequence> <xs:element name="name" type="xs:string"/> <xs:element name="phoneNumber" type="xs:string" minOccurs="0"/> </xs:sequence> <xs:attribute name="id" type="xs:int" use="required"/> </xs:complexType> </xs:element> <xs:element name="invoice"> <xs:complexType> <xs:sequence> <xs:element name="amount" type="money"/> <xs:element name="status"> <xs:simpleType> <xs:restriction base="xs:token"> <xs:enumeration value="open"/> <xs:enumeration value="due"/> <xs:enumeration value="cleared"/> </xs:restriction> </xs:simpleType> </xs:element> </xs:sequence> <xs:attribute name="customer" type="xs:int" use="required"/> </xs:complexType> </xs:element> </xs:schema></programlisting> <para>An example data set:</para> <programlisting language="none"><data xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:noNamespaceSchemaLocation="invoice.xsd"> <customer id="5"> <name>Clarke Jefferson</name> </customer> <invoice customer="5"> <amount>33.12</amount> <status>due</status> </invoice> </data></programlisting> </answer> </qandaentry> </qandadiv> </qandaset> </section> <section xml:id="airlineXsd"> <title>The airline example revisited</title> <qandaset defaultlabel="qanda" xml:id="qandaAirlineXsd"> <title>Airline meta information by XML schema</title> <qandadiv> <qandaentry> <question> <para>Transform the relational schema from <xref linkend="airlineRelationalSchema"/> into an XML schema and supply some test data. In particular consider the following constraints:</para> <itemizedlist> <listitem> <para>Data types</para> <itemizedlist> <listitem> <para><link xlink:href="http://en.wikipedia.org/wiki/List_of_airline_codes">ICAO airline designator</link></para> </listitem> <listitem> <para><link xlink:href="http://en.wikipedia.org/wiki/International_Civil_Aviation_Organization_airport_code">ICAO airport code</link></para> </listitem> </itemizedlist> </listitem> <listitem> <para>Primary / Unique key definitions</para> </listitem> <listitem> <para>Foreign key definitions</para> </listitem> <listitem> <para>CHECK constraint: Your XML schema will require <tag class="starttag">xs:assert test="..." </tag> and thus XML schema version 1.1. You may want to read about co-occurrence constraints as being described in <link xlink:href="http://www.ibm.com/developerworks/library/x-xml11pt2">Listing 6. Assertion on complex type - @height < @width</link>.</para> </listitem> </itemizedlist> <para>The following XML example instance may guide you towards an <filename>airline.xsd</filename> schema:</para> <programlisting language="none"><top xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:noNamespaceSchemaLocation="airline.xsd"> <airlines> <airline airlineCode="DLH" id="1"> <name>Lufthansa</name> </airline> <airline airlineCode="AFR" id="2"> <name>Air France</name> </airline> </airlines> <destinations> <destination id="1" airportCode="EDDF"> <fullName>Frankfurt International Airport – Frankfurt am Main</fullName> </destination> <destination id="3" airportCode="EBCI"> <fullName>Brussels South Charleroi Airport – Charleroi</fullName> </destination> </destinations> <flights> <flight id="1" airline="2" origin="1" destination="3"> <flightNumber>LH 4234</flightNumber> </flight> </flights> </top></programlisting> <para>Hints:</para> <itemizedlist> <listitem> <para>Identify all relational schema constraints from solution of <xref linkend="airlineRelationalSchema"/> and model them accordingly.</para> </listitem> <listitem> <para>The above example does not contain any constraint violations. In order to test your schema for completeness tinkering with primary key, unique and referencing attribute values may be helpful.</para> </listitem> </itemizedlist> </question> <answer> <programlisting language="none"><xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns:vc="http://www.w3.org/2007/XMLSchema-versioning" elementFormDefault="qualified" vc:minVersion="1.1"> <xs:simpleType name="ICAOAirportCode"> <xs:restriction base="xs:string"> <xs:length value="4" /> <xs:pattern value="[A-Z09]+"></xs:pattern> </xs:restriction> </xs:simpleType> <xs:simpleType name="ICAOAirlineCode"> <xs:restriction base="xs:string"> <xs:length value="3"/> <xs:pattern value="[A-Z]+"></xs:pattern> </xs:restriction> </xs:simpleType> <xs:element name="top"> <xs:complexType> <xs:sequence> <xs:element ref="airlines"/> <xs:element ref="destinations"/> <xs:element ref="flights"/> </xs:sequence> </xs:complexType> <xs:keyref name="_FK_Flight_airline" refer="_PK_Airline_id"> <xs:selector xpath="flights/flight"/> <xs:field xpath="@airline"/> </xs:keyref> <xs:keyref name="_FK_Flight_origin" refer="_PK_Destination_id"> <xs:selector xpath="flights/flight"/> <xs:field xpath="@origin"/> </xs:keyref> <xs:keyref name="_FK_Flight_destination" refer="_PK_Destination_id"> <xs:selector xpath="flights/flight"/> <xs:field xpath="@destination"/> </xs:keyref> </xs:element> <xs:element name="airlines"> <xs:complexType> <xs:sequence> <xs:element ref="airline" minOccurs="0" maxOccurs="unbounded"/> </xs:sequence> </xs:complexType> <xs:key name="_PK_Airline_id"> <xs:selector xpath="airline"/> <xs:field xpath="@id"/> </xs:key> <xs:key name="_UN_Airline_name"> <xs:selector xpath="airline"/> <xs:field xpath="name"/> </xs:key> <xs:key name="_UN_Airline_airlineCode"> <xs:selector xpath="airline"/> <xs:field xpath="@airlineCode"/> </xs:key> </xs:element> <xs:element name="airline"> <xs:complexType> <xs:sequence> <xs:element name="name" type="xs:string"/> </xs:sequence> <xs:attribute name="id" type="xs:int" use="required"/> <xs:attribute name="airlineCode" type="ICAOAirlineCode" use="required"/> </xs:complexType> </xs:element> <xs:element name="destinations"> <xs:complexType> <xs:sequence> <xs:element ref="destination" minOccurs="0" maxOccurs="unbounded"/> </xs:sequence> </xs:complexType> <xs:key name="_PK_Destination_id"> <xs:selector xpath="destination"/> <xs:field xpath="@id"/> </xs:key> <xs:key name="_UN_Destination_airportCode"> <xs:selector xpath="destination"/> <xs:field xpath="@airportCode"/> </xs:key> </xs:element> <xs:element name="destination"> <xs:complexType> <xs:sequence> <xs:element name="fullName"/> </xs:sequence> <xs:attribute name="id" type="xs:int"/> <xs:attribute name="airportCode" type="ICAOAirportCode"/> </xs:complexType> </xs:element> <xs:element name="flights"> <xs:complexType> <xs:sequence> <xs:element ref="flight" minOccurs="0" maxOccurs="unbounded"/> </xs:sequence> </xs:complexType> <xs:key name="_PK_Flight_id"> <xs:selector xpath="flight"/> <xs:field xpath="@id"/> </xs:key> <xs:key name="_UN_Flight_flightNumber"> <xs:selector xpath="flight"/> <xs:field xpath="flightNumber"/> </xs:key> </xs:element> <xs:element name="flight"> <xs:complexType> <xs:sequence> <xs:element name="flightNumber" type="xs:string"/> </xs:sequence> <xs:attribute name="id" type="xs:int" use="required"/> <xs:attribute name="airline" type="xs:int" use="required"/> <xs:attribute name="origin" type="xs:int"/> <xs:attribute name="destination" type="xs:int"/> <xs:assert test="not(@origin = @destination)"> <xs:annotation> <xs:documentation>CHECK constraint _CK_Flight_origin_destination</xs:documentation> </xs:annotation> </xs:assert> </xs:complexType> </xs:element> </xs:schema></programlisting> </answer> </qandaentry> </qandadiv> </qandaset> </section> <section xml:id="xmlAndJava"> <title>Relating <abbrev xlink:href="http://en.wikipedia.org/wiki/Document_Type_Declaration">schema</abbrev>'s and <link linkend="gloss_Java"><trademark>Java</trademark></link> class descriptions.</title> <para>We may also compare XML data constraints to <link linkend="gloss_Java"><trademark>Java</trademark></link>. A <link linkend="gloss_Java"><trademark>Java</trademark></link> class declaration is actually a blueprint for a <trademark xlink:href="http://www.oracle.com/technetwork/java/javase">JRE</trademark> to instantiate compatible objects. Likewise an XML schema restricts well-formed documents:</para> <figure xml:id="fig_XmlAndJava"> <title>XML <abbrev xlink:href="http://en.wikipedia.org/wiki/Document_Type_Declaration">schema</abbrev>'s and <link linkend="gloss_Java"><trademark>Java</trademark></link> class declarations.</title> <mediaobject> <imageobject> <imagedata fileref="Ref/Fig/xmlattribandjava.fig" scale="65"/> </imageobject> </mediaobject> </figure> </section> <section xml:id="xmlSchemaExercise"> <title>XML schema exercises</title> <section xml:id="sectSchemaProductCatalog"> <title>A product catalog</title> <qandaset defaultlabel="qanda" xml:id="quandaProductCatalog"> <title>Product catalog schema</title> <qandadiv> <qandaentry> <question> <para>Consider the following product catalog example:</para> <programlisting language="none"><catalog xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:noNamespaceSchemaLocation="catalog.xsd"> <title>Outdoor products</title> <introduction> <para>We offer a great variety of basic stuff for mountaineering such as ropes, harnesses and tents.</para> <para>Our shop is proud for its large number of available sleeping bags.</para> </introduction> <product id="x-223"> <title>Multi freezing bag Nightmare camper</title> <description> <para>You will feel comfortable till minus 20 degrees - At least if you are a penguin or a polar bear.</para> </description> </product> <product id="r-334"> <title>Rope 40m</title> <description> <para>Excellent for indoor climbing.</para> </description> </product> </catalog></programlisting> <para>As you may have inferred the following rules shall apply for arbitrary catalog documents:</para> <itemizedlist> <listitem> <para>Each <tag class="starttag">catalog</tag> shall have exactly one <tag class="starttag">title</tag> and <tag class="starttag">introduction</tag> element.</para> </listitem> <listitem> <para><tag class="starttag">introduction</tag> and <tag class="starttag">description</tag> shall have at least one <tag class="starttag">para</tag> child.</para> </listitem> <listitem> <para>Each <tag class="starttag">catalog</tag> shall have at least one <tag class="starttag">product</tag>.</para> </listitem> <listitem> <para>Each <tag class="starttag">product</tag> shall have exactly one <tag class="starttag">title</tag> and at least one <tag class="starttag">para</tag> child element.</para> </listitem> <listitem> <para>The required <code>id</code> attribute shall not contain whitespace and be unique with respect to all <tag class="starttag">product</tag> elements.</para> </listitem> <listitem> <para>The attribute price shall represent money amounts and be optional.</para> </listitem> </itemizedlist> <para>Provide a suitable <filename>catalog.xsd</filename> schema.</para> </question> <answer> <programlisting language="none"><xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns:vc="http://www.w3.org/2007/XMLSchema-versioning" elementFormDefault="qualified" vc:minVersion="1.0" vc:maxVersion="1.1"> <xs:simpleType name="money"> <xs:restriction base="xs:decimal"> <xs:fractionDigits value="2"/> </xs:restriction> </xs:simpleType> <xs:element name="title" type="xs:string"/> <xs:element name="para" type="xs:string"/> <xs:element name="description" type="paraSequence"/> <xs:element name="introduction" type="paraSequence"/> <xs:complexType name="paraSequence"> <xs:sequence> <xs:element ref="para" minOccurs="1" maxOccurs="unbounded"/> </xs:sequence> </xs:complexType> <xs:element name="product"> <xs:complexType> <xs:sequence> <xs:element ref="title"/> <xs:element ref="description"/> </xs:sequence> <xs:attribute name="id" type="xs:token" use="required"/> <xs:attribute name="price" type="money" use="optional"/> </xs:complexType> </xs:element> <xs:element name="catalog"> <xs:complexType> <xs:sequence> <xs:element ref="title"/> <xs:element ref="introduction"/> <xs:element ref="product" minOccurs="1" maxOccurs="unbounded"/> </xs:sequence> </xs:complexType> <xs:key name="uniqueProductId"> <xs:selector xpath="product"></xs:selector> <xs:field xpath="@id"/> </xs:key> </xs:element> </xs:schema></programlisting> </answer> </qandaentry> </qandadiv> </qandaset> </section> <section xml:id="sectQandaBookV1"> <title>Book like documents</title> <qandaset defaultlabel="qanda" xml:id="example_operatorprecedence"> <title>Book documents with mixed content and itemized lists</title> <qandadiv> <qandaentry xml:id="example_book_v2"> <question> <para>Extend the first version of <link linkend="example_bookDtd">book.xsd</link> to support the following features:</para> <itemizedlist> <listitem> <para>Within a <tag class="starttag">chapter</tag> node <tag class="starttag">para</tag> and <tag class="starttag">itemizedlist</tag> elements in arbitrary order shall be allowed.</para> </listitem> <listitem> <para><tag class="starttag">itemizedlist</tag> nodes shall contain at least one <tag class="starttag">listitem</tag>.</para> </listitem> <listitem> <para><tag class="starttag">listitem</tag> nodes shall be composed of one or more para or nested list item elements.</para> </listitem> <listitem> <para>Within a <tag class="starttag">para</tag> we want to be able to emphasize text passages.</para> </listitem> </itemizedlist> <para>The following sample document instance shall be valid:</para> <programlisting language="none"><?xml version="1.0" encoding="UTF-8"?> <book xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:noNamespaceSchemaLocation="catalog.xsd"> <title>Introduction to Java</title> <chapter> <title>Introduction</title> <para>Java supports <emphasis>lots</emphasis> of concepts:</para> <itemizedlist> <listitem> <para>Single <emphasis>implementation</emphasis> inheritance.</para> </listitem> <listitem> <para>Multiple <emphasis>interface</emphasis> inheritance.</para> <itemizedlist> <listitem><para>Built in types</para></listitem> <listitem><para>User defined types</para></listitem> </itemizedlist> </listitem> </itemizedlist> </chapter> </book></programlisting> </question> <answer> <para>An extended schema looks like:</para> <figure xml:id="paraListEmphasize"> <title>Version 2 of book.xsd</title> <programlisting language="none"><?xml version="1.0" encoding="UTF-8"?> <xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns:vc="http://www.w3.org/2007/XMLSchema-versioning" elementFormDefault="qualified" vc:minVersion="1.0" vc:maxVersion="1.1"> <xs:import namespace="http://www.w3.org/XML/1998/namespace" schemaLocation="http://www.w3.org/2001/03/xml.xsd" /> <xs:include schemaLocation="table.xsd"/> <!-- Type definitions --> <xs:simpleType name="languageType"> <xs:restriction base="xs:string"> <xs:enumeration value="en"/> <xs:enumeration value="fr"/> <xs:enumeration value="de"/> <xs:enumeration value="it"/> <xs:enumeration value="es"/> </xs:restriction> </xs:simpleType> <!-- Elements having no inner structure --> <xs:element name="emphasis" type="xs:string"/> <xs:element name="title" type="xs:string"/> <xs:element name="link"> <xs:complexType mixed="true"> <xs:attribute name="linkend" type="xs:IDREF" use="required"/> </xs:complexType> </xs:element> <!-- Starting the game ... --> <xs:element name="book"> <xs:complexType> <xs:sequence> <xs:element ref="title"/> <xs:element ref="chapter" minOccurs="1" maxOccurs="unbounded"/> </xs:sequence> <xs:attribute name="lang" type="languageType" use="optional"/> </xs:complexType> </xs:element> <xs:element name="chapter"> <xs:complexType> <xs:sequence> <co xml:id="figure_book.dtd_v2_chapter"/> <xs:element ref="title"/> <xs:choice minOccurs="1" maxOccurs="unbounded"> <xs:element ref="para"/> <xs:element ref="itemizedlist"/> <xs:element ref="table"/> </xs:choice> </xs:sequence> <xs:attribute name="id" type="xs:ID" use="optional"/> <xs:attribute ref="xml:base"/> <!-- This allows for <xi:include ...> --> </xs:complexType> </xs:element> <xs:element name="para"> <xs:complexType mixed="true"> <co xml:id="figure_book.dtd_v2_para"/> <xs:choice minOccurs="0" maxOccurs="unbounded"> <xs:element ref="emphasis"/> <xs:element ref="link"/> </xs:choice> <xs:attribute name="id" type="xs:ID" use="optional"/> </xs:complexType> </xs:element> <xs:element name="itemizedlist"> <xs:complexType> <xs:sequence> <xs:element ref="listitem" minOccurs="1" <co xml:id="figure_book.dtd_v2_itemizedlist"/> maxOccurs="unbounded"/> </xs:sequence> <xs:attribute name="id" type="xs:ID" use="optional"/> </xs:complexType> </xs:element> <xs:element name="listitem"> <xs:complexType> <xs:choice minOccurs="1" maxOccurs="unbounded"> <co xml:id="figure_book.dtd_v2_listitem"/> <xs:element ref="para"/> <xs:element ref="itemizedlist"/> </xs:choice> </xs:complexType> </xs:element> </xs:schema></programlisting> <caption> <para>This allows emphasized text in <tag class="starttag">para</tag> nodes and <tag class="starttag">itemizedlists</tag>.</para> </caption> </figure> <calloutlist> <callout arearefs="figure_book.dtd_v2_chapter"> <para>We hook into <tag class="starttag">chapter</tag> to allow arbitrary sequences of at least one <tag class="starttag">para</tag> or <tag class="starttag">itemizedlist</tag> element node.</para> </callout> <callout arearefs="figure_book.dtd_v2_para"> <para><tag class="starttag">para</tag> nodes now allow mixed content.</para> </callout> <callout arearefs="figure_book.dtd_v2_itemizedlist"> <para>An <tag class="starttag">itemizedlist</tag> contains at least one list item.</para> </callout> <callout arearefs="figure_book.dtd_v2_listitem"> <para>A <tag class="starttag">listitem</tag> contains a sequence of at least one <tag class="starttag">para</tag> or <tag class="starttag">itemizedlist</tag> child node. The latter gives rise to nested lists. We find a similar construct in HTML namely unnumbered lists defined by <code><UL><LI>... </code>constructs.</para> </callout> </calloutlist> </answer> </qandaentry> </qandadiv> </qandaset> </section> <section xml:id="sectQandaBookLang"> <title>Allow different languages</title> <qandaset defaultlabel="qanda" xml:id="example_book.dtd_v3"> <title>book.xsd and languages</title> <qandadiv> <qandaentry> <question> <para>We want to extend our schema from <xref linkend="example_book_v2"/> by allowing an author to define the language to be used within the whole or parts of the document in question. Add an attribute <code>lang</code> to all relevant elements like e.g. <tag class="starttag">para lang="es"</tag>. An XML editor may use this attribute to activate corresponding dictionaries for spell checking.</para> <para>The <code>lang</code> attribute shall be restricted to the following values:</para> <itemizedlist> <listitem> <para><token>en</token></para> </listitem> <listitem> <para><token>fr</token></para> </listitem> <listitem> <para><token>de</token></para> </listitem> <listitem> <para><token>it</token></para> </listitem> <listitem> <para><token>es</token></para> </listitem> </itemizedlist> </question> <answer> <para>We define a suitable <tag class="starttag">xs:attribute</tag> type:</para> <programlisting language="none"><xs:attribute <emphasis role="bold">name="lang"</emphasis>> <xs:simpleType> <xs:restriction base="xs:string"> <xs:enumeration value="en"/> <xs:enumeration value="fr"/> <xs:enumeration value="de"/> <xs:enumeration value="it"/> <xs:enumeration value="es"/> </xs:restriction> </xs:simpleType> </xs:attribute></programlisting> <para>Than we add this attribute to our elements like <tag class="starttag">chapter</tag> and others:</para> <programlisting language="none"> <xs:element name="chapter"> <xs:complexType> <xs:sequence> ... </xs:sequence> <xs:attribute <emphasis role="bold">ref="lang"</emphasis> use="optional"/> ... </xs:complexType> </xs:element></programlisting> <para>This allows us to set a language on arbitrary hierarchy level. But of course we may define it on top level as well:</para> <programlisting language="none"><?xml version="1.0" encoding="UTF-8"?> <book ... lang="english"> <title>Introduction to Java</title> ...</programlisting> </answer> </qandaentry> </qandadiv> </qandaset> </section> <section xml:id="sectMixQuotes"> <title>Mixing attribute quotes</title> <qandaset defaultlabel="qanda" xml:id="example_quotes"> <title>Single and double quotes reconsidered</title> <qandadiv> <qandaentry> <question> <para>We recall the problem of nested quotes yielding non-well formed XML code:</para> <programlisting language="none"><img src="bold.gif" alt="We may use "quotes" here" /></programlisting> <para>The XML specification defines legal attribute value definitions as:</para> <productionset> <title><link xlink:href="http://www.w3.org/TR/2008/REC-xml-20081126/#d0e888">Literals</link></title> <production xml:id="w3RecXml_NT-EntityValue"> <lhs>EntityValue</lhs> <rhs>'"' ([^%&"] | <nonterminal def="#w3RecXml_NT-PEReference">PEReference</nonterminal> | <nonterminal def="#w3RecXml_NT-Reference">Reference</nonterminal>)* '"' | "'" ([^%&'] | <nonterminal def="#w3RecXml_NT-PEReference">PEReference</nonterminal> | <nonterminal def="#w3RecXml_NT-Reference">Reference</nonterminal>)* "'"</rhs> </production> <production xml:id="w3RecXml_NT-AttValue"> <lhs>AttValue</lhs> <rhs>'"' ([^<&"] | <nonterminal def="#w3RecXml_NT-Reference">Reference</nonterminal>)* '"' | "'" ([^<&'] | <nonterminal def="#w3RecXml_NT-Reference">Reference</nonterminal>)* "'"</rhs> </production> <production xml:id="w3RecXml_NT-SystemLiteral"> <lhs>SystemLiteral</lhs> <rhs>('"' [^"]* '"') | ("'" [^']* "'")</rhs> </production> <production xml:id="w3RecXml_NT-PubidLiteral"> <lhs>PubidLiteral</lhs> <rhs>'"' <nonterminal def="#w3RecXml_NT-PubidChar">PubidChar</nonterminal>* '"' | "'" (<nonterminal def="#w3RecXml_NT-PubidChar">PubidChar</nonterminal> - "'")* "'"</rhs> </production> <production xml:id="w3RecXml_NT-PubidChar"> <lhs>PubidChar</lhs> <rhs>#x20 | #xD | #xA | [a-zA-Z0-9] | [-'()+,./:=?;!*#@$_%]</rhs> </production> </productionset> <para>Find out how it is possible to set the attribute <tag class="attribute">alt</tag>'s value to the string <code>We may use "quotes" here</code>.</para> </question> <answer> <para>The production rule for attribute values reads:</para> <productionset> <productionrecap linkend="w3RecXml_NT-AttValue"/> </productionset> <para>This allows us to use either of two alternatives to delimit attribute values:</para> <glosslist> <glossentry> <glossterm><tag class="starttag">img ... alt="..."/</tag></glossterm> <glossdef> <para><emphasis>Validity constraint:</emphasis> do not use <code>"</code> inside the value string.</para> </glossdef> </glossentry> <glossentry> <glossterm><tag class="starttag">img ... alt='...'/</tag></glossterm> <glossdef> <para><emphasis>Validity constraint:</emphasis> do not use <code>'</code> inside the value string.</para> </glossdef> </glossentry> </glosslist> <para>We may take advantage of the second rule:</para> <programlisting language="none"><img src="bold.gif" alt='We may use "quotes" here' /></programlisting> <para>Notice that according to <xref linkend="w3RecXml_NT-AttValue"/> the delimiting quotes must not be mixed. The following code is thus not well formed:</para> <programlisting language="none"><img src="bold.gif'/></programlisting> </answer> </qandaentry> </qandadiv> </qandaset> </section> <section xml:id="qandasetInternalRef"> <title>Internal references</title> <qandaset defaultlabel="qanda" xml:id="example_book.dtd_v5"> <title>book.xsd and internal references</title> <qandadiv> <qandaentry> <question> <para>We want to extend <xref linkend="example_book.dtd_v3"/> schema to allow for document internal references by:</para> <itemizedlist> <listitem> <para>Allowing each <tag class="starttag">chapter</tag>, <tag class="starttag">para</tag> and <tag class="starttag">itemizedlist</tag> to become reference targets.</para> </listitem> <listitem> <para>Extending the element <tag class="element">para</tag>'s mixed content model by a new element <tag class="element">link</tag> with an attribute <tag class="attribute">linkend</tag> being a reference to a target.</para> </listitem> </itemizedlist> </question> <answer> <para>We extend our schema:</para> <programlisting language="none"><?xml version="1.0" encoding="UTF-8"?> <xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns:vc="http://www.w3.org/2007/XMLSchema-versioning" elementFormDefault="qualified" vc:minVersion="1.0" vc:maxVersion="1.1"> <xs:import namespace="http://www.w3.org/XML/1998/namespace" schemaLocation="http://www.w3.org/2001/03/xml.xsd" /> <xs:include schemaLocation="table.xsd"/> <!-- Type definitions --> <xs:attribute name="lang"> <xs:simpleType> <xs:restriction base="xs:string"> <xs:enumeration value="en"/> <xs:enumeration value="fr"/> <xs:enumeration value="de"/> <xs:enumeration value="it"/> <xs:enumeration value="es"/> </xs:restriction> </xs:simpleType> </xs:attribute> <!-- Elements having no inner structure --> <xs:element name="emphasis" type="xs:string"/> <xs:element name="title" type="xs:string"/> <xs:element name="link"> <xs:complexType mixed="true"> <co xml:id="progamlisting_book_v5_link"/> <xs:attribute name="linkend" <co xml:id="progamlisting_book_v5_link_linkend"/> type="xs:IDREF" use="required"/> </xs:complexType> </xs:element> <!-- Starting the game ... --> <xs:element name="book"> <xs:complexType> <xs:sequence> <xs:element ref="title"/> <xs:element ref="chapter" minOccurs="1" maxOccurs="unbounded"/> </xs:sequence> <xs:attribute ref="lang" use="optional"/> </xs:complexType> </xs:element> <xs:element name="chapter"> <xs:complexType> <xs:sequence> <xs:element ref="title"/> <xs:choice minOccurs="1" maxOccurs="unbounded"> <xs:element ref="para"/> <xs:element ref="itemizedlist"/> <xs:element ref="table"/> </xs:choice> </xs:sequence> <xs:attribute ref="lang" use="optional"/> <xs:attribute name="id" <co xml:id="progamlisting_book_v5_chapter_id"/> type="xs:ID" use="optional"/> <xs:attribute ref="xml:base"/> <!-- This allows for <xi:include ...> --> </xs:complexType> </xs:element> <xs:element name="para"> <xs:complexType mixed="true"> <co xml:id="progamlisting_book_v5_mixed_link"/> <xs:choice minOccurs="0" maxOccurs="unbounded"> <xs:element ref="emphasis"/> <xs:element ref="link"/> </xs:choice> <xs:attribute ref="lang" use="optional"/> <xs:attribute name="id" <co xml:id="progamlisting_book_v5_para_id"/> type="xs:ID" use="optional"/> </xs:complexType> </xs:element> <xs:element name="itemizedlist"> <xs:complexType> <xs:sequence> <xs:element ref="listitem" minOccurs="1" maxOccurs="unbounded"/> </xs:sequence> <xs:attribute ref="lang" use="optional"/> <xs:attribute name="id" type="xs:ID" use="optional"/> </xs:complexType> </xs:element> <xs:element name="listitem"> <xs:complexType> <xs:choice minOccurs="1" maxOccurs="unbounded"> <xs:element ref="para"/> <xs:element ref="itemizedlist"/> </xs:choice> <xs:attribute ref="lang" use="optional"/> </xs:complexType> </xs:element> </xs:schema></programlisting> <calloutlist> <callout arearefs="progamlisting_book_v5_chapter_id"> <para>Defining an attribute <tag class="attribute">id</tag> of type <code>ID</code> for the elements <tag class="element">chapter</tag>, <tag class="element">para</tag> and <tag class="element">itemizedList</tag>. This enables an author to define internal reference targets.</para> </callout> <callout arearefs="progamlisting_book_v5_mixed_link"> <para>A link is part of the element <tag class="element">para</tag>'s mixed content model. Thus an author may define internal references along with ordinary text.</para> </callout> <callout arearefs="progamlisting_book_v5_link"> <para>Like in HTML a link may contain text. If converted to HTML the formatting expectation is a hypertext link.</para> </callout> <callout arearefs="progamlisting_book_v5_link_linkend"> <para>The attribute <tag class="attribute">linkend</tag> holds the reference to an internal target being either a <tag class="element">chapter</tag>, a <tag class="element">para</tag> or an <tag class="element">itemizedList</tag>.</para> </callout> </calloutlist> </answer> </qandaentry> </qandadiv> </qandaset> </section> </section> </section> </chapter> <chapter xml:id="xsl"> <title>The Extensible Stylesheet Language XSL</title> <para>XSL is a <link xlink:href="http://www.w3.org/Style/XSL">W3C standard</link> which defines a language to transform XML documents into the following output formats:</para> <itemizedlist> <listitem> <para>Ordinary text e.g in <link xlink:href="http://unicode.org">Unicode</link> encoding.</para> </listitem> <listitem> <para>XML.</para> </listitem> <listitem> <para>HTML</para> </listitem> <listitem> <para>XHTML</para> </listitem> </itemizedlist> <para>Transforming a source XML document into a target XML document may be required if:</para> <itemizedlist> <listitem> <para>The target document expresses similar semantics but uses a different XML dialect i.e. different tag names.</para> </listitem> <listitem> <para>The target document is only a view on the source document. We may for example extract the chapter names from a <tag class="starttag">book</tag> document to create a table of contents.</para> </listitem> </itemizedlist> <section xml:id="xsl_helloworld"> <title>A <quote>Hello, world</quote> <abbrev xlink:href="http://www.w3.org/Style/XSL">XSL</abbrev> example</title> <para>We start from an extended version of our <filename>memo.xsd</filename>:</para> <programlisting language="none"><xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns:vc="http://www.w3.org/2007/XMLSchema-versioning" elementFormDefault="qualified" vc:minVersion="1.0" vc:maxVersion="1.1"> <xs:element name="memo"> <xs:complexType> <xs:sequence> <xs:element name="from" type="Person"/> <xs:element name="to" type="Person" minOccurs="1" maxOccurs="unbounded"/> <xs:element name="subject" type="xs:string"/> <xs:element ref="content"/> </xs:sequence> <xs:attribute name="date" type="xs:date" use="required"/> <xs:attribute name="priority" type="Priority" use="optional"/> </xs:complexType> </xs:element> <xs:complexType name="Person"> <xs:simpleContent> <xs:extension base="xs:string"> <xs:attribute name="id" type="xs:ID"/> </xs:extension> </xs:simpleContent> </xs:complexType> <xs:element name="content"> <xs:complexType> <xs:sequence> <xs:element ref="para" minOccurs="1" maxOccurs="unbounded"/> </xs:sequence> </xs:complexType> </xs:element> <xs:element name="para"> <xs:complexType mixed="true"> <xs:sequence> <xs:element ref="link" minOccurs="0"/> </xs:sequence> </xs:complexType> </xs:element> <xs:element name="link"> <xs:complexType mixed="true"> <xs:simpleContent> <xs:extension base="xs:string"> <xs:attribute name="linkend" type="xs:IDREF"/> </xs:extension> </xs:simpleContent> </xs:complexType> </xs:element> <xs:simpleType name="Priority"> <xs:restriction base="xs:string"> <xs:enumeration value="low"/> <xs:enumeration value="medium"/> <xs:enumeration value="high"/> </xs:restriction> </xs:simpleType> </xs:schema></programlisting> <para>This schema allows a memo's document content to be structured into paragraphs. A paragraph may contain links either to the sender or to a recipient.</para> <figure xml:id="figure_memoref_instance"> <title>A memo document instance with an internal reference.</title> <programlisting language="none"><memo xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:noNamespaceSchemaLocation="memo.xsd" date="2014-09-24" priority="high" > <from <emphasis role="bold">id="goik"</emphasis>>Martin Goik</from> <to>Adam Hacker</to> <to id="eve">Eve Intruder</to> <subject>Firewall problems</subject> <content> <para>Thanks for your excellent work.</para> <para>Our firewall is definitely broken! This bug has been reported by the <link <emphasis role="bold">linkend="goik"</emphasis>>sender</link>.</para> </content> </memo></programlisting> </figure> <para>We want to extract the sender's name from an arbitrary <tag class="element">memo</tag> document instance. Using <abbrev xlink:href="http://www.w3.org/Style/XSL">XSL</abbrev> this task can be accomplished by a script <filename>memo2sender.xsl</filename>:</para> <programlisting language="none"><?xml version="1.0" encoding="UTF-8"?> <xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="2.0"> <xsl:output method="text"/> <xsl:template match="/memo"> <xsl:value-of select="from"/> </xsl:template> </xsl:stylesheet></programlisting> <para>Before closer examining this code we first show its effect. We need a piece of software called a <abbrev xlink:href="http://www.w3.org/Style/XSL">XSL</abbrev> processor. It reads both a <tag>memo</tag> document instance and a style sheet and produces the following output:</para> <programlisting language="none"><computeroutput>[goik@mupter Memoref]$ xml2xml message.xml memo2sender.xsl Martin Goik</computeroutput></programlisting> <para>The result is the sender's name <computeroutput>Martin Goik</computeroutput>. We may sketch the transformation principle:</para> <figure xml:id="figure_xsl_principle"> <title>An <abbrev xlink:href="http://www.w3.org/Style/XSL">XSL</abbrev> processor transforming a XML document into a result using a stylesheet</title> <mediaobject> <imageobject> <imagedata fileref="Ref/Fig/xslconvert.fig"/> </imageobject> </mediaobject> </figure> <para>The executable <filename>xml2xml</filename> defined at the MI department is actually a script wrapping the <productname xlink:href="http://saxon.sourceforge.net">Saxon XSLT processor</productname>. We may also use the Eclipse/Oxygen plugin replacing the shell command by a GUI <link xlink:href="http://www.oxygenxml.com/doc/ug-editorEclipse/#topics/defining-new-transformation-scenario.html">as being described in the corresponding documentation</link>. Next we closer examine the <abbrev xlink:href="http://www.w3.org/Style/XSL">XSL</abbrev> example code:</para> <programlisting language="none"><xsl:stylesheet <co xml:id="programlisting_helloxsl_stylesheet"/> xmlns:xsl <co xml:id="programlisting_helloxsl_namespace_abbv"/> ="http://www.w3.org/1999/XSL/Transform" version="2.0" <co xml:id="programlisting_helloxsl_xsl_version"/> > <xsl:output method="text" <co xml:id="programlisting_helloxsl_method_text"/>/> <xsl:template <co xml:id="programlisting_helloxsl_template"/> match <co xml:id="programlisting_helloxsl_match"/> ="/memo"> <xsl:value-of <co xml:id="programlisting_helloxsl_value-of"/> select <co xml:base="" xml:id="programlisting_helloxsl_valueof_select_att"/> ="from" /> </xsl:template> </xsl:stylesheet></programlisting> <calloutlist> <callout arearefs="programlisting_helloxsl_stylesheet"> <para>The element stylesheet belongs the the namespace <code>http://www.w3.org/1999/XSL/Transform</code>. This namespace is <emphasis>represented</emphasis> by the literal <literal>xsl</literal>. As an alternative we might also use <tag class="starttag">stylesheet xmlns="http://www.w3.org/1999/XSL/Transform"</tag> instead of <tag class="starttag">xsl:stylesheet ...</tag>. The value of the namespace itself gets defined next.</para> </callout> <callout arearefs="programlisting_helloxsl_namespace_abbv"> <para>The keyword <code>xmlns</code> is reserved by the <link xlink:href="http://www.w3.org/TR/REC-xml-names/">Namespaces in XML</link> specification. In <quote>pure</quote> XML the whole term <code>xmlns:xsl</code> would simply define an attribute. In presence of a namespace aware XML parser however the literal <literal>xsl</literal> represents the attribute value <tag class="attvalue">http://www.w3.org/1999/XSL/Transform</tag>. This value <emphasis>must not</emphasis> be changed! Otherwise a XSL converter will fail since it cannot distinguish processing instructions from other XML elements. An element <tag class="starttag">stylesheet</tag> belonging to a different namespace <code>http//someserver.org/SomeNamespace</code> may have to be generated.</para> </callout> <callout arearefs="programlisting_helloxsl_xsl_version"> <para>The <link xlink:href="http://www.w3.org/TR/xslt20">XSL standard</link> is still evolving. The version number identifies the conformance level for the subsequent code.</para> </callout> <callout arearefs="programlisting_helloxsl_method_text"> <para>The <tag class="attribute">method</tag> attribute in the <link xlink:href="http://www.w3.org/TR/xslt20/#element-output"><xsl:output></link> element specifies the type of output to be generated. Depending on this type we may also define indentation depths and/or encoding. Allowed <tag class="attvalue">method</tag> values are:</para> <glosslist> <glossentry> <glossterm>text</glossterm> <glossdef> <para>Ordinary text.</para> </glossdef> </glossentry> <glossentry> <glossterm>html</glossterm> <glossdef> <para><link xlink:href="http://www.w3.org/TR/html4">HTML</link> markup.</para> </glossdef> </glossentry> <glossentry> <glossterm>xhtml</glossterm> <glossdef> <para><link xlink:href="http://www.w3.org/TR/xhtml1">Xhtml</link> markup differing from the former by e.g. the closing <quote>/></quote> in <tag><img src="..."/></tag>.</para> </glossdef> </glossentry> <glossentry> <glossterm>xml</glossterm> <glossdef> <para>XML code. This is most commonly used to create views on or different dialects of a XML document instance.</para> </glossdef> </glossentry> </glosslist> </callout> <callout arearefs="programlisting_helloxsl_template"> <para>A <tag class="starttag">xsl:template</tag> defines the output that will be created for document nodes being defined by a selector.</para> </callout> <callout arearefs="programlisting_helloxsl_match"> <para>The attribute <tag class="attribute">match</tag> tells us for which nodes of a document instance the given <tag class="starttag">xsl:template</tag> is appropriate. In the given example the value <code>/memo</code> tells us that the template is only responsible for <tag class="element">memo</tag> nodes appearing at top level i.e. being the root element of the document instance.</para> </callout> <callout arch="" arearefs="programlisting_helloxsl_value-of programlisting_helloxsl_valueof_select_att"> <para>A <tag class="element">value-of</tag> element writes content to the <abbrev xlink:href="http://www.w3.org/Style/XSL">XSL</abbrev> process' output. In this example the <code>#PCDATA</code> content from the element <tag class="element">from</tag> will be written to the output.</para> </callout> </calloutlist> </section> <section xml:id="xpath"> <title><link xlink:href="http://www.w3.org/TR/xpath">XPath</link> and node sets</title> <para>The <acronym xlink:href="http://www.w3.org/TR/xpath">XPath</acronym> standard allows us to retrieve node sets from XML documents by predicate based queries. Thus its role may be compared to <acronym xlink:href="http://en.wikipedia.org/wiki/Sql">SQL</acronym> <code>SELECT</code> ... <code>FROM</code> ...<code>WHERE</code> queries. Some simple examples:</para> <figure xml:id="fig_Xpath"> <title>Simple <acronym xlink:href="http://www.w3.org/TR/xpath">XPath</acronym> queries</title> <mediaobject> <imageobject> <imagedata fileref="Ref/Fig/xpath.fig" scale="65"/> </imageobject> </mediaobject> </figure> <para>We are now interested in a list of all recipients being defined in a <tag class="element">memo</tag> element. We introduce the element <tag class="element">xsl:for-each</tag> which iterates over a result set of nodes:</para> <figure xml:id="programlisting_tolist_xpath"> <title>Iterating over the list of recipient nodes.</title> <programlisting language="none"><?xml version="1.0" encoding="UTF-8"?> <xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="2.0"> <xsl:output method="text"/> <xsl:template match="/" <co xml:id="programlisting_tolist_match_root"/>> <xsl:for-each select="memo/to" <co xml:id="programlisting_tolist_xpath_memo_to"/> > <xsl:value-of select="." <co xml:id="programlisting_tolist_value_of"/> /> <xsl:text>,</xsl:text> <co xml:id="programlisting_tolist_xsl_text"/> </xsl:for-each> </xsl:template> </xsl:stylesheet></programlisting> </figure> <calloutlist> <callout arearefs="programlisting_tolist_match_root"> <para>This template matches the XML document instance, <emphasis>not</emphasis> the visible <tag class="element"><memo></tag> node.</para> </callout> <callout arearefs="programlisting_tolist_xpath_memo_to"> <para>The <link xlink:href="http://www.w3.org/TR/xpath">XPath</link> expression <tag class="attvalue">memo/to</tag> gets evaluated starting from the invisible top level document node being the context node. For the given document instance this will define a result set containing both <tag class="element"><to></tag> recipient nodes, see <xref linkend="figure_memo_xpath_memo_to"/>.</para> </callout> <callout arearefs="programlisting_tolist_value_of"> <para>The dot <quote>.</quote> represents the <code>#PCDATA</code> content of the current <tag class="element">to</tag> element.</para> </callout> <callout arearefs="programlisting_tolist_xsl_text"> <para>A comma is appended. This is not quite correct since it should be absent for the last element.</para> </callout> </calloutlist> <figure xml:id="figure_recipientlist_trailing_comma"> <title>A list of recipients.</title> <para>The <abbrev xlink:href="http://www.w3.org/Style/XSL">XSL</abbrev> presented before yields:</para> <programlisting language="none"><computeroutput>Adam Hacker,Eve Intruder</computeroutput><emphasis role="bold">,</emphasis></programlisting> </figure> <para>Right now we do not bother about the trailing <quote>,</quote> after the last recipient. The surrounding <code><xsl:text></code>,<code></xsl:text></code> elements <emphasis>may</emphasis> be omitted. We encourage the reader to leave them in place since they increase readability when a template's body gets more complex. The element <tag class="starttag">xsl:text</tag> is used to append static text to the output. This way we append a separator after each recipient. We now discuss the role of the two attributes <tag class="attribute">match="/"</tag> and <tag class="attribute">select=memo/to</tag>. Both are examples of so called <link xlink:href="http://www.w3.org/TR/xpath">XPath</link> expressions. They allow to define <emphasis>node sets</emphasis> being subsets from the set of all nodes from a given document instance.</para> <para>Conceptually <link xlink:href="http://www.w3.org/TR/xpath">XPath</link> expressions may be compared to the <acronym xlink:href="http://en.wikipedia.org/wiki/Sql">SQL</acronym> language the latter allowing the retrieval of data<emphasis>sets</emphasis> from a relational database. We illustrate the current example by a figure:</para> <figure xml:id="figure_memo_xpath_memo_to"> <title>Selecting node sets from <tag class="element">memo</tag> document instances</title> <mediaobject> <imageobject> <imagedata fileref="Ref/Fig/memoxpath.fig"/> </imageobject> </mediaobject> </figure> <para>This figure needs some explanation. We observe an additional node <quote>above</quote> <tag class="starttag">memo</tag> being represented as <quote>filled</quote>. This node represents the document instance as a whole and has got <tag>memo</tag> as its only child. We will rediscover this additional root node when we discuss the <abbrev xlink:href="http://www.w3.org/TR/2004/REC-DOM-Level-3-Core-20040407">DOM</abbrev> application programming interface.</para> <para>As already mentioned the expression <code>memo/to</code> evaluates to a <emphasis>set</emphasis> of nodes. In our example this set consists of two nodes of type <tag class="starttag">to</tag> each of them representing a recipient of the memo. We observe a subtle difference between the two <abbrev xlink:href="http://www.w3.org/TR/xpath">XPath</abbrev> expressions:</para> <glosslist> <glossentry> <glossterm><code>match="/"</code></glossterm> <glossdef> <para>The expression starts and actually consists of the string <quote>/</quote>. Thus it can be called an <emphasis>absolute</emphasis> <abbrev xlink:href="http://www.w3.org/TR/xpath">XPath</abbrev> expression. Like a file specification <filename>C:\dos\myprog.exe</filename> it starts on top level and needs no further context information to get evaluated.</para> <para>A <abbrev xlink:href="http://www.w3.org/Style/XSL">XSL</abbrev> style sheet <emphasis>must</emphasis> have an <link xlink:href="http://www.w3.org/TR/xslt20/#initiating">initial context node</link> to start the transformation. This is achieved by providing exactly one <tag class="starttag">xsl:template</tag> with an absolute <abbrev xlink:href="http://www.w3.org/TR/xpath">XPath</abbrev> value for its <tag class="attribute">match</tag> attribute like <tag class="attvalue">/memo</tag>.<emphasis/></para> </glossdef> </glossentry> <glossentry> <glossterm><code>select="memo/to"</code></glossterm> <glossdef> <para>This expression can be compared to a <emphasis>relative</emphasis> file path specification like e.g. <filename>../images/hdm.gif</filename>. We need to add the base (context) directory in order for a relative file specification to become meaningful. If the base directory is <filename>/home/goik/xml</filename> than this <emphasis>relative</emphasis> file specification will address the file <filename>/home/goik/images/hdm.gif</filename>.</para> <para>Likewise we have to define a <emphasis>context</emphasis> node if we want to evaluate a relative <abbrev xlink:href="http://www.w3.org/TR/xpath">XPath</abbrev> expression. In our example this is the root node. The XSL specification introduces the term <link xlink:href="http://www.w3.org/TR/xslt20/#context">evaluation context</link> for this purpose.</para> </glossdef> </glossentry> </glosslist> <para>In order to explain relative <abbrev xlink:href="http://www.w3.org/TR/xpath">XPath</abbrev> expressions we consider <code>content/para</code> starting from the (unique!) <tag class="element">memo</tag> node:</para> <figure xml:id="memoXpathPara"> <title>The node set represented by <code>content/para</code> starting at the context node <tag class="starttag">memo</tag>.</title> <mediaobject> <imageobject> <imagedata fileref="Ref/Fig/memorelativexpath.fig"/> </imageobject> <caption> <para>The dashed lines represent the relative <abbrev xlink:href="http://www.w3.org/TR/xpath">XPath</abbrev> expressions starting from the context node to each of the nodes in the result set.</para> </caption> </mediaobject> </figure> </section> <section xml:id="xsl_important_elements"> <title>Some important <abbrev xlink:href="http://www.w3.org/Style/XSL">XSL</abbrev> elements</title> <section xml:id="xsl_if"> <title><tag class="starttag">xsl:if</tag></title> <para>Sometimes we need conditional processing rules. We might want create a list of sender and recipients with a defined value for the attribute <tag class="attribute">id</tag>. In the <link linkend="figure_memoref_instance">given example</link> this is only valid for the (unique) sender and the recipient <code><to id="eve">Eve Intruder</to></code>. We assume this set of persons shall be inserted into a relational database table <code>Customer</code> consisting of two <code>NOT NULL</code> columns <code>id</code> an <code>name</code>. Thus both attributes <emphasis>must</emphasis> be specified and we must exclude <tag class="starttag">from</tag> or <tag class="starttag">to</tag> nodes with undefined <tag class="attribute">id</tag> attributes:</para> <figure xml:id="programlisting_memo_export_sql"> <title>Exporting SQL statements.</title> <programlisting language="none">... <xsl:variable name="newline" <co xml:id="programlisting_xsl_if_definevar"/>> <!-- A newline \n --> <xsl:text> </xsl:text> </xsl:variable> <xsl:template match="/memo"> <xsl:for-each select="from|to" <co xml:id="programlisting_xsl_if_foreach"/>> <xsl:if <emphasis role="bold">test="@id"</emphasis> <co xml:id="programlisting_xsl_if_test"/>> <xsl:text>INSERT INTO Customer (id, name) VALUES ('</xsl:text> <xsl:value-of select="@id" <co xml:id="programlisting_xsl_if_select_idattrib"/>/> <xsl:text>', '</xsl:text> <xsl:value-of select="." <co xml:id="programlisting_xsl_if_selectcontent"/>/> <xsl:text>')</xsl:text> <xsl:value-of select="$newline" <co xml:id="programlisting_xsl_if_usevar"/>/> </xsl:if> </xsl:for-each> </xsl:template></programlisting> <caption> <para>We want to export data from XML documents to a database server. For this purpose INSERT statements are being crafted from a XML document containing relevant data.</para> </caption> </figure> <calloutlist> <callout arearefs="programlisting_xsl_if_definevar"> <para>Define a file local variable <code>newline</code>. Dealing with text output frequently requires the insertion of newlines. Due to the syntax of the <tag class="element">xsl:text</tag> elements this tends to clutter the code.</para> </callout> <callout arearefs="programlisting_xsl_if_foreach"> <para>Iterate over the set of the sender node and all recipient nodes.</para> </callout> <callout arearefs="programlisting_xsl_if_test"> <para>The attribute value of <tag class="attribute">test</tag> will be <link xlink:href="http://www.w3.org/TR/xslt20/#xsl-if">evaluated</link> as a boolean. In this example it evaluates to <code>true</code> iff the attribute <tag class="attribute">id</tag> is defined for the context node. Since we are inside the <tag class="element">xsl:for-each</tag> block all context nodes are either of type <tag class="starttag">from</tag> or <tag class="starttag">to</tag> and thus <emphasis>may</emphasis> have an <tag class="attribute">id</tag> attribute.</para> </callout> <callout arearefs="programlisting_xsl_if_select_idattrib"> <para>The <tag class="attribute">id</tag> attributes value is copied to the output. The <quote>@</quote> character in <code>select="@id"</code> tells the <abbrev xlink:href="http://www.w3.org/Style/XSL">XSL</abbrev> processor to read the value of an <emphasis>attribute</emphasis> with name <tag class="attribute">id</tag> rather then the content of a nested sub<emphasis>element</emphasis> like in <code><to id="foo"><id>I am nested!</id></to></code>.</para> </callout> <callout arearefs="programlisting_xsl_if_selectcontent"> <para>As stated earlier the dot <quote>.</quote> denotes the current context element. In this example simply the <code>#PCDATA</code> content is copied to the output.</para> </callout> <callout arearefs="programlisting_xsl_if_usevar"> <para>The <quote>$</quote> sign in front of <code>newline</code> tells the <abbrev xlink:href="http://www.w3.org/Style/XSL">XSL</abbrev> processor to access the variable <varname>newline</varname> previously defined in <coref linkend="programlisting_xsl_if_definevar"/> rather then interpreting it as the name of a sub element or an attribute.</para> </callout> </calloutlist> <para>As expected the recipient entry <quote>Adam Hacker</quote> does not appear due to the fact that no <tag class="attribute">id</tag> attribute is defined in its <tag class="starttag">to</tag> element:</para> <programlisting language="none"><computeroutput>INSERT INTO Customer (id, name) VALUES ('goik', 'Martin Goik') INSERT INTO Customer (id, name) VALUES ('eve', 'Eve intruder')</computeroutput></programlisting> <qandaset defaultlabel="qanda" xml:id="example_position_last"> <title>The XPath functions position() and last()</title> <qandadiv> <qandaentry> <question> <para>We return to our recipient list in <xref linkend="figure_recipientlist_trailing_comma"/>. We are interested in a list of recipients avoiding the trailing comma:</para> <programlisting language="none"><computeroutput>Adam Hacker,Eve Intruder</computeroutput></programlisting> <para>We may use a <tag class="element">xsl:if</tag> to insert a comma for all but the very last recipient node. This can be achieved by using the <abbrev xlink:href="http://www.w3.org/Style/XSL">XSL</abbrev> functions <link xlink:href="http://www.w3.org/TR/xpath#function-position">position()</link> and <link xlink:href="http://www.w3.org/TR/xpath#function-last">last()</link>. Hint: The arithmetic operator <quote><</quote> may be used in <abbrev xlink:href="http://www.w3.org/Style/XSL">XSL</abbrev> to compare two integer numbers. However it must be escaped as <code>&lt;</code> in order to be XML compatible.</para> </question> <answer> <para>We have to exclude the comma for the last node of the recipient list. If we have e.g. 10 recipients the function <code>position()</code> will return values integer values starting at 1 and ending with 10. So for the last node the comparison <code>10 < 10</code> will evaluate to false:</para> <programlisting language="none"><xsl:for-each select="memo/to"> <xsl:value-of select="."/> <xsl:if test="position() &lt; last()"> <xsl:text>,</xsl:text> </xsl:if> </xsl:for-each></programlisting> </answer> </qandaentry> <qandaentry xml:id="example_avoid_xsl_if"> <question> <label>Avoiding xsl:if</label> <para>In <xref linkend="programlisting_memo_export_sql"/> we used the <abbrev xlink:href="http://www.w3.org/TR/xpath">XPath</abbrev> value <quote>from|to</quote> to select the desired sender and recipient nodes. Inside the <tag class="element">xsl:for-each</tag> block we permitted only those nodes which have an <tag class="attribute">id</tag> attribute. These two steps may be combined into a single <abbrev xlink:href="http://www.w3.org/TR/xpath">XPath</abbrev> expression obsoleting the <tag class="element">xsl:if</tag>.</para> </question> <answer> <para>We simply need a modified <abbrev xlink:href="http://www.w3.org/TR/xpath">XPath</abbrev> in the <tag class="element">for-each</tag>:</para> <programlisting language="none"><xsl:for-each select="<emphasis role="bold">from[@id]|to[@id]</emphasis>"> <xsl:text>INSERT INTO Customer (id, name) VALUES ('</xsl:text> <xsl:value-of select="@id"/> <xsl:text>', '</xsl:text> <xsl:value-of select="."/> <xsl:text>')</xsl:text> <xsl:value-of select="$newline"/> </xsl:for-each></programlisting> </answer> </qandaentry> </qandadiv> </qandaset> </section> <section xml:id="xsl_apply_templates"> <title><tag class="starttag">xsl:apply-templates</tag></title> <para>We already used <tag class="element">xsl:for-each</tag> to iterate over a list of element nodes. <abbrev xlink:href="http://www.w3.org/Style/XSL">XSL</abbrev> offers a different possibility for this purpose. The idea is to define the formatting rules at a centralized location. So the solution to <xref linkend="example_position_last"/> in an equivalent way:</para> <programlisting language="none"><xsl:template match="/"> <xsl:apply-templates select="memo/to" <co xml:id="programlisting_apply_templates_apply"/>/> </xsl:template> <xsl:template match="to" <co xml:id="programlisting_apply_templates_match"/>> <xsl:value-of select="."/> <xsl:if test="<emphasis role="bold">position()</emphasis> &lt; <emphasis role="bold">last()</emphasis>"> <xsl:text>,</xsl:text> </xsl:if> </xsl:template></programlisting> <calloutlist> <callout arearefs="programlisting_apply_templates_apply"> <para>Definition of the recipient node list. Each element of this list shall be processed further.</para> </callout> <callout arearefs="programlisting_apply_templates_match"> <para>This template <emphasis>may</emphasis> be used by a XSL processor to format nodes of type <tag class="starttag">to</tag>. Since the processor is asked to do exactly this in <xref linkend="programlisting_apply_templates_apply"/> the current template will <emphasis>really</emphasis> be used in this example.</para> </callout> </calloutlist> <para>The procedure outlined above may have the following advantages:</para> <itemizedlist> <listitem> <para>Some elements may appear at different places of a given document hierarchy. For example a <tag class="starttag">title</tag> element is likely to appear as a child of chapters, sections, tables figures and so on. It may be sufficient to define a single template with a <code>match="title"</code> attribute which contains all rules being required.</para> </listitem> <listitem> <para>Sometimes the body of a <tag class="starttag">xsl:for-each</tag> ... <tag class="endtag">xsl:for-each</tag> spans multiple screens thus limiting code readability. Factoring out the body into a template may avoid this obstacle.</para> </listitem> </itemizedlist> <para>This method is well known from programming languages: If the code inside a loop is needed multiple times or reaches a painful line count <emphasis>good</emphasis> programmers tend to define a separate method. For example:</para> <programlisting language="none">for (int i = 0; i < 10; i++){ if (a[i] < b[i]){ max[i] = b; } else { max[i] = a; } ... }</programlisting> <para>Inside the loop's body the relative maximum value of two variables gets computed. This may be needed at several locations and thus it is convenient to centralize this code into a method:</para> <programlisting language="none">// cf. <xsl:template match="..."> static int maximum(int a, int b){ if (a < b){ return b; } else { return a; } } ... // cf. <xsl:apply-templates select="..."/> for (int i = 0; i < 10; i++){ max[i] = maximum(a[i], b[i]); }</programlisting> <para>So far calling a static method in <link linkend="gloss_Java"><trademark>Java</trademark></link> may be compared to a <tag class="starttag">xsl:apply-templates</tag>. There is however one big difference. In <abbrev xlink:href="http://www.w3.org/Style/XSL">XSL</abbrev> the <quote>method</quote> being called may not exist at all. A <tag class="starttag">xsl:apply-templates</tag> instructs a processor to format a set of nodes. It does not contain information about any rules being defined to do this job:</para> <programlisting language="none"><xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="2.0"> <xsl:output method="text"/> <xsl:template match="/memo"> <xsl:apply-templates <emphasis role="bold">select="content"</emphasis>/> </xsl:template> </xsl:stylesheet></programlisting> <para>Since no suitable template supplying rules for <tag class="starttag">content</tag> nodes exists a <abbrev xlink:href="http://www.w3.org/Style/XSL">XSL</abbrev> processor uses a default formatting rule instead:</para> <programlisting language="none"><computeroutput>Thanks for your excellent work.Our firewall is definitely broken! This bug has been reported by the sender.</computeroutput></programlisting> <para>We observe that the <code>#PCDATA</code> content strings of the element itself and all (recursive) sub elements get glued together into one string. In most cases this is definitely not intended. Omitting a necessary template is usually a programming error. It is thus good programming practice during style sheet development to define a special template catching forgotten rules:</para> <programlisting language="none"><xsl:template match="/memo"> <xsl:apply-templates select="content"/> </xsl:template> <xsl:template match="*"> <xsl:message> <xsl:text>Error: No template defined matching element '</xsl:text> <xsl:value-of select="name(.)"/> <xsl:text>'</xsl:text> </xsl:message> </xsl:template></programlisting> <para>The <quote>*</quote> matches any element if there is no <link xlink:href="http://www.w3.org/TR/xslt20/#conflict">better matching</link> rule defined. Since we did not supply any template for <tag class="starttag">content</tag> nodes at all this default template will match nodes of type <tag class="starttag">content</tag>. The function <code>name()</code> is predefined in <abbrev xlink:href="http://www.w3.org/Style/XSL">XSL</abbrev> and returns the element type name of a node. During the formatting process we will now see the following warning message:</para> <programlisting language="none"><computeroutput>Error: No template defined matching element 'content'</computeroutput></programlisting> <para>We note that for document nodes <tag class="starttag">xyz</tag><code>foo</code><tag class="endtag">xyz</tag> containing only <code>#PCDATA</code> a simple <tag class="emptytag">xsl:apply-templates select="xyz"</tag> is sufficient: A <abbrev xlink:href="http://www.w3.org/Style/XSL">XSL</abbrev> processor uses its default rule and copies the node's content <code>foo</code> to its output.</para> <qandaset defaultlabel="qanda" xml:id="example_rdbms_person"> <title>Extending the export to a RDBMS</title> <qandadiv> <qandaentry> <question> <para>We assume that our RDBMS table <code>Customer</code> from <xref linkend="programlisting_memo_export_sql"/> shall be replaced by a table <code>Person</code>. We expect the senders of memo documents to be employees of a given company. Conversely the recipients of memos are expected to be customers. Our <code>Person</code> table shall have a <quote>tag</quote> like column named <code>type</code> having exactly two allowed values <code>customer</code> or <code>employee</code> being controlled by a <code>CHECK</code> constraint, see <xref linkend="table_person"/>. Create a style sheet generating the necessary SQL statements from a memo document instance. Hint: Define two different templates for <tag class="starttag">from</tag> and <tag class="starttag">to</tag> nodes.</para> </question> <answer> <para>We define two templates differing only in the static string value for a person's type. The relevant <abbrev xlink:href="http://www.w3.org/Style/XSL">XSL</abbrev> portion reads:<programlisting language="none"><xsl:template match="/memo"> <xsl:apply-templates select="from|to"/> </xsl:template> <xsl:template match="from"> <xsl:text>INSERT INTO Person (name, type) VALUES('</xsl:text> <xsl:value-of select="."/> <xsl:text>', <emphasis role="bold">'employee'</emphasis>)</xsl:text> <xsl:value-of select="$newline"/> </xsl:template> <xsl:template match="to"> <xsl:text>INSERT INTO Person (name, type) VALUES('</xsl:text> <xsl:value-of select="."/> <xsl:text>', <emphasis role="bold">'customer'</emphasis>)</xsl:text> <xsl:value-of select="$newline"/> </xsl:template></programlisting></para> </answer> </qandaentry> </qandadiv> </qandaset> <table xml:id="table_person"> <title>The Person table</title> <?dbhtml table-width="30%" ?> <?dbfo table-width="40%" ?> <tgroup cols="2"> <colspec colwidth="3*"/> <colspec colwidth="2*"/> <thead> <row> <entry>name</entry> <entry>type</entry> </row> </thead> <tbody> <row> <entry>Martin Goik</entry> <entry>employee</entry> </row> <row> <entry>Adam Hacker</entry> <entry>customer</entry> </row> <row> <entry>Eve intruder</entry> <entry>customer</entry> </row> </tbody> </tgroup> </table> </section> <section xml:id="xsl_choose"> <title><tag class="starttag">xsl:choose</tag></title> <para>We already described the <tag class="starttag">xsl:if</tag> which can be compared to an <code>if(..){...}</code> statement in many programming languages. The <tag class="starttag">xsl:choose</tag> element can be compared to multiple <code>else</code> conditions including an optional final <code>else</code> block being reached if all boolean tests fail:</para> <programlisting language="none">if (condition a){ ...//block 1 } else if (condition b){ ... //block b } ... ... else { ... //code being reached whan all conditions evaluate to false }</programlisting> <para>We want to generate a list of memo recipient names with roman type numeration up to 10. Higher numbers shall be displayed in ordinary decimal notation:</para> <programlisting language="none"><computeroutput>I:Adam Hacker II:Eve intruder III: ... IV: ... ...</computeroutput></programlisting> <para>Though <abbrev xlink:href="http://www.w3.org/Style/XSL">XSL</abbrev> offers <link xlink:href="http://www.w3.org/TR/xslt20/#convert">a better way</link> we may generate these number literals by:</para> <programlisting language="none"><xsl:template match="/memo"> <xsl:apply-templates select="to"/> </xsl:template> <xsl:template match="to"> <xsl:choose> <xsl:when test="1 = position()">I</xsl:when> <xsl:when test="2 = position()">II</xsl:when> <xsl:when test="3 = position()">III</xsl:when> <xsl:when test="4 = position()">IV</xsl:when> <xsl:when test="5 = position()">V</xsl:when> <xsl:when test="6 = position()">VI</xsl:when> <xsl:when test="7 = position()">VII</xsl:when> <xsl:when test="8 = position()">VIII</xsl:when> <xsl:when test="9 = position()">IX</xsl:when> <xsl:when test="10 = position()">X</xsl:when> <xsl:otherwise> <xsl:value-of select="position()"/> </xsl:otherwise> </xsl:choose> <xsl:text>:</xsl:text> <xsl:value-of select="."/> <xsl:value-of select="$newline"/> </xsl:template></programlisting> <para>Note that this conversion is incomplete: If the number in question is larger than 10 it will be formatted in ordinary decimal style according to the <tag class="starttag">xsl:otherwise</tag> clause.</para> </section> <section xml:id="section_html_book"> <title>A complete HTML formatting example</title> <para>We now present a series of exercises showing how to format <tag class="starttag">book</tag> document instances to XHTML. This is done in a step by step manner each time showing correspondent code snippets for our <filename>memo.xsd</filename>.</para> <section xml:id="section_memo_to_list"> <title>Listing the recipients of a memo</title> <para>In order to generate a XHTML <link xlink:href="http://www.w3.org/TR/html401/struct/lists.html#h-10.2">list</link> of all <tag class="starttag">memo</tag> recipients of a memo we have to use <tag class="starttag">xsl:output method="xhtml"</tag> and embed the required HTML tags in our <abbrev xlink:href="http://www.w3.org/Style/XSL">XSL</abbrev> style sheet:</para> <programlisting language="none"><xsl:output method="xhtml" indent="yes"/> <xsl:template match="/memo"> <html> <head> <title>Recipient list</title> </head> <body> <ul> <xsl:apply-templates select="to"/> </ul> </body> </html> </xsl:template> <xsl:template match="to"> <li> <xsl:value-of select="."/> </li> </xsl:template></programlisting> <para>Processing this style sheet for a <tag class="starttag">memo</tag> document instance yields:</para> <programlisting language="none"><?xml version="1.0" encoding="UTF-8"?> <html> <head> <title>Recipient list</title> </head> <body> <ul> <li>Adam Hacker</li> <li>Eve intruder</li> </ul> </body> </html></programlisting> <para>The generated Xhtml code does not contain a reference to a DTD. We may supply this reference by modifying our <tag class="emptytag">xsl:output</tag> directive:</para> <programlisting language="none"><xsl:output method="xhtml" indent="yes" <emphasis role="bold">doctype-public</emphasis>="-//W3C//DTD XHTML 1.0 Strict//EN" <emphasis role="bold">doctype-system</emphasis>="http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"/></programlisting> <para>This adds a corresponding header which allows to validate the generated HTML:</para> <programlisting language="none"><!DOCTYPE html PUBLIC "<emphasis role="bold">-//W3C//DTD XHTML 1.0 Strict//EN</emphasis>" "<emphasis role="bold">http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd</emphasis>"> <html><head> ...</programlisting> <para>This may be improved further by instructing the XSL formatter to use <uri xlink:href="http://www.w3.org/1999/xhtml">http://www.w3.org/1999/xhtml</uri> as default namespace:</para> <programlisting language="none"><?xml version="1.0" encoding="UTF-8"?> <xsl:stylesheet <emphasis role="bold">xmlns="http://www.w3.org/1999/xhtml"</emphasis> xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="2.0"> <xsl:output method="xhtml" indent="yes" doctype-public="-//W3C//DTD XHTML 1.0 Strict//EN" doctype-system="http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"/> <xsl:template match="/"> <html><head> ... </xsl:template> ... </xsl:stylesheet></programlisting> <para>This yields the following output::</para> <programlisting language="none"><?xml version="1.0" encoding="UTF-8"?> <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"> <html <emphasis role="bold">xmlns="http://www.w3.org/1999/xhtml"</emphasis>> <head> ... </html></programlisting> <para>The top level element <tag class="element">html</tag> is now declared to belong to the namespace <code>xmlns="http://www.w3.org/1999/xhtml</code>. This will be inherited by all inner Xhtml elements.</para> <qandaset defaultlabel="qanda" xml:id="example_xsl_book_1_dtd"> <title>Transforming book instances to Xhtml</title> <qandadiv> <qandaentry> <question> <para>Create a <abbrev xlink:href="http://www.w3.org/Style/XSL">XSL</abbrev> style sheet to transform instances of the first version of <link endterm="example_bookDtd" linkend="example_bookDtd">book.xsd</link> (<xref linkend="example_bookDtd"/>) into <uri xlink:href="http://www.w3.org/TR/xhtml1/#a_dtd_XHTML-1.0-Strict">Xhtml 1.0 strict</uri>.</para> <para>You should first construct a Xhtml document <emphasis>manually</emphasis> before coding the XSL. After you have a <quote>working</quote> Xhtml example document create a <abbrev xlink:href="http://www.w3.org/Style/XSL">XSL</abbrev> style sheet which transforms arbitrary <filename>book.xsd</filename> document instances into a corresponding Xhtml file.</para> </question> <answer> <programlisting language="none"><?xml version="1.0" encoding="UTF-8"?> <xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="2.0"> <xsl:output indent="yes" method="xhtml"/> <xsl:template match="/book"> <html> <head> <title><xsl:value-of select="title"/></title> </head> <body> <h1><xsl:value-of select="title"/></h1> <xsl:apply-templates select="chapter"/> </body> </html> </xsl:template> <xsl:template match="chapter"> <h2><xsl:value-of select="title"/></h2> <xsl:apply-templates select="para"/> </xsl:template> <xsl:template match="para"> <p><xsl:value-of select="."/></p> </xsl:template> </xsl:stylesheet></programlisting> </answer> </qandaentry> </qandadiv> </qandaset> </section> <section xml:id="section_xsl_attribute"> <title><tag class="starttag">xsl:attribute</tag></title> <para>Sometimes we want to set attribute values in a generated XML document. For example we might want to set the background color <quote>red</quote> if a memo has a priority value of <tag class="attvalue">high</tag>:</para> <programlisting language="none"><h1 style="background:red">Firewall problems</h1></programlisting> <para>Regarding our memo example this may be achieved by:</para> <programlisting language="none"><xsl:template match="/memo"> <html> ... <body> <xsl:variable name="<emphasis role="bold">messageColor</emphasis>" <co xml:id="programlisting_priority_lolor_vardef"/>> <xsl:choose> <xsl:when test="@priority = 'low'">green</xsl:when> <xsl:when test="@priority = 'medium'">yellow</xsl:when> <xsl:when test="@priority = 'high'">red</xsl:when> </xsl:choose> </xsl:variable> <h1 style="background:{<emphasis role="bold">$messageColor</emphasis>};" <co xml:id="programlisting_priority_lolor_usevar"/>> <xsl:value-of select="subject"/> </h1> </body> </html> </xsl:template></programlisting> <calloutlist> <callout arearefs="programlisting_priority_lolor_vardef"> <para>Definition of a color name depending on the attribute <tag class="attvalue">priority</tag>'s value. The set off possible attribute values (low,medium,high) is mapped to the color names (green, yellow,red).</para> </callout> <callout arearefs="programlisting_priority_lolor_usevar"> <para>The color variable is used to compose the attribute <tag class="attribute">style</tag>'s value. The curly <code>{...}</code> braces are part of the <abbrev xlink:href="http://www.w3.org/Style/XSL">XSL</abbrev> standard's syntax. They are required here to instruct the <abbrev xlink:href="http://www.w3.org/Style/XSL">XSL</abbrev> processor to substitute the local variable <code>messageColor</code>'s value instead of simply copying the literal string <quote><code>$messageColor</code></quote> itself to the output document e.g. generating <tag class="starttag">h1 style = "background:$messageColor;"</tag>.</para> </callout> </calloutlist> <para>Instead of constructing an extra variable <abbrev xlink:href="http://www.w3.org/Style/XSL">XSL</abbrev> offers a slightly more compact way for the same purpose. The <tag class="starttag">xsl:attribute</tag> element allows us to define the name of an attribute to be added together with an attribute value specification:</para> <programlisting language="none"><xsl:template match="/memo"> <html> ... <h1> <xsl:attribute name="<emphasis role="bold">style</emphasis>"> <xsl:text>background:</xsl:text> <xsl:choose> <xsl:when test="@priority = 'low'">green</xsl:when> <xsl:when test="@priority = 'medium'">yellow</xsl:when> <xsl:when test="@priority = 'high'">red</xsl:when> </xsl:choose> </xsl:attribute> <xsl:value-of select="subject"/> </h1> </body> </html> </xsl:template></programlisting> <qandaset defaultlabel="qanda" xml:id="example_book_toc"> <title>Adding a table of contents (toc)</title> <qandadiv> <qandaentry> <question> <para>For larger document instances it is convenient to add a table of contents to the generated Xhtml document. <!-- We demonstrate the desired result as an <uri xlink:href="src/viewlet/bookhtmltoc/bookhtmltoc_viewlet_swf.html">animation</uri>.--></para> <para>For this exercise you need a unique string value for each <tag class="starttag">chapter</tag> node. If a <tag class="starttag">chapter</tag>'s <tag class="attribute">id</tag> attribute had been declared as <code>#REQUIRED</code> its value would do this job perfectly. Unfortunately you cannot rely on its existence since it is declared to be <code>#IMPLIED</code> and may thus be absent.</para> <para>XSL offers a standard function for this purpose namely <link xlink:href="http://www.w3.org/TR/xslt20/#generate-id">generate-id(...)</link>. In a nutshell this function takes a XML node as an argument (or being called without arguments it uses the context node) and creates a string value being unique with respect to <emphasis>all</emphasis> other nodes in the document. For a given node the function may be called repeatedly and is guaranteed to always return the same value during the <emphasis>same</emphasis> transformation run. So it suffices to add something like <tag class="starttag">a href="#{generate-id(...)}"</tag> or use it in conjunction with <tag class="starttag">xsl:attribute</tag>.</para> </question> <answer> <para>We use the <code>generate-id()</code> function to create a unique identity string for each chapter node. Since we also want to define links to the table of contents we need another unique string value. It is tempting to simply use a static value like <quote>__toc__</quote> for this purpose. However we can not be sure that this value coincides with one of the <code>generate-id()</code> function return values.</para> <para>A cleaner solution uses the <tag class="starttag">book</tag> node's generated identity string for this purpose. As stated before this value is definitively unique:</para> <programlisting language="none"><xsl:template match="/book"> ... <body> <h1><xsl:value-of select="title"/></h1> <h2 id="{generate-id(.)}" <co xml:base="" xml:id="programlisting_book_toc_def_toc"/>>Table of contents</h2> <ul> <xsl:for-each select="chapter"> <li> <a href="#{generate-id(.)}" <co xml:base="" xml:id="programlisting_book_toc_ref_chap"/>><xsl:value-of select="title"></xsl:value-of></a> </li> </xsl:for-each> </ul> <xsl:apply-templates select="chapter"/> </body> </html> </xsl:template> <xsl:template match="chapter"> <h2 id="{generate-id(.)}" <co xml:base="" xml:id="programlisting_book_toc_def_chap"/>> <a href="#{generate-id(/book)}" <co xml:base="" xml:id="programlisting_book_toc_ref_toc"/>> <xsl:value-of select="title"/> </a> </h2> <xsl:apply-templates select="para"/> </xsl:template> ...</programlisting> <calloutlist> <callout arearefs="programlisting_book_toc_def_toc"> <para>The current context node is <tag class="starttag">book</tag>. We use it as argument to <code>generate-id()</code> to create a unique identity string.</para> </callout> <callout arearefs="programlisting_book_toc_ref_chap"> <para>The <tag class="starttag">xsl:for-each</tag> iterates over all <tag class="starttag">chapter</tag> nodes. We reference the corresponding target nodes being created in <xref linkend="programlisting_book_toc_def_chap"/>.</para> </callout> <callout arearefs="programlisting_book_toc_def_chap"> <para>Each <tag class="starttag">chapter</tag>'s heading is supplied with a unique identity string being referenced from <xref linkend="programlisting_book_toc_ref_chap"/>.</para> </callout> <callout arearefs="programlisting_book_toc_ref_toc"> <para>Clicking on a chapter's title shall take us back to the table of contents (toc). So we create a hypertext link referencing our toc heading's identity string being defined in <xref linkend="programlisting_book_toc_def_toc"/>.</para> </callout> </calloutlist> </answer> </qandaentry> </qandadiv> </qandaset> </section> <section xml:id="section_xsl_mixed"> <title>XSL and mixed content</title> <para>The subsequent example shows an element <tag class="starttag">content</tag> having a mixed content model possibly containing <tag class="starttag">url</tag> and <tag class="starttag">emphasis</tag> child nodes:</para> <programlisting language="none"><content>The <emphasis role="bold"><url href="http://w3.org/XML">XML</url></emphasis> language is <emphasis role="bold"><emphasis>easy</emphasis></emphasis> to learn. However you need some <emphasis role="bold"><emphasis>time</emphasis></emphasis>.</content></programlisting> <para>Embedded element nodes have been set to bold style in order to distinguish them from <code>xs:text</code> nodes. A possible <acronym>XHtml</acronym> output might look like:</para> <programlisting language="none"><p>The <emphasis role="bold"><a href="http://w3.org/XML">XML</a>language is<em>easy</em></emphasis> to learn. However you need some <emphasis role="bold"><em>time</em></emphasis>.</p></programlisting> <para>We start with a first version of an <abbrev xlink:href="http://www.w3.org/Style/XSL">XSL</abbrev> template:</para> <programlisting language="none"> <xsl:template match="content"> <p> <xsl:value-of select="."/> </p> </xsl:template></programlisting> <para>As mentioned earlier all <code>#PCDATA</code> text nodes of the whole subtree are glued together leading to:</para> <programlisting language="none"><p>The XML language is easy to learn. However you need some time.</p></programlisting> <para>Our next attempt is to define templates to format the elements <tag class="starttag">url</tag> and <tag class="starttag">emphasis</tag>:</para> <programlisting language="none">... <xsl:template match="content"> <p> <xsl:apply-templates select="emphasis|url"/> </p> </xsl:template> <xsl:template match="url"> <a href="{@href}"><xsl:value-of select="."/></a> </xsl:template> <xsl:template match="emphasis"> <em><xsl:value-of select="."/></em> </xsl:template> ...</programlisting> <para>As expected the sub elements are formatted correctly. Unfortunately the <code>#PCDATA</code> text nodes between the element nodes are lost:</para> <programlisting language="none"><p> <a href="http://w3.org/XML">XML</a> <em>easy</em> <em>time</em> </p></programlisting> <para>To correct this transformation script we have to tell the formatting processor to include bare text nodes into the output. The <abbrev xlink:href="http://www.w3.org/TR/xpath">XPath</abbrev> standard defines a function <link xlink:href="http://www.w3.org/TR/xpath#path-abbrev">text()</link> for this purpose. It returns the boolean value <code>true</code> for an argument node of type text:</para> <programlisting language="none">... <xsl:template match="content"> <p> <xsl:apply-templates select="<emphasis role="bold">text()</emphasis>|emphasis|url"/> </p> </xsl:template> ...</programlisting> <para>The yields the desired output. The text node result elements are shown in bold style</para> <programlisting language="none"><p><emphasis role="bold">The</emphasis> <a href="http://w3.org/XML">XML</a><emphasis role="bold"> language is </emphasis><em>easy</em><emphasis role="bold"> to learn. However you need some </emphasis><em>time</em><emphasis role="bold">.</emphasis></p></programlisting> <para>Some remarks:</para> <orderedlist> <listitem> <para>The <abbrev xlink:href="http://www.w3.org/TR/xpath">XPath</abbrev> expression <code>select="text()|emphasis|url"</code> corresponds nicely to the schema's content model definition:</para> <programlisting language="none"><xs:element name="content"> <xs:complexType <emphasis role="bold">mixed="true"</emphasis>> <xs:choice minOccurs="0" maxOccurs="unbounded"> <xs:element <emphasis role="bold">ref="emphasis"</emphasis>/> <xs:element <emphasis role="bold">ref="url"</emphasis>/> </xs:choice> ... </xs:complexType> </xs:element></programlisting> </listitem> <listitem> <para>In most mixed content models <emphasis>all</emphasis> sub elements of e.g. <tag class="starttag" role="">content</tag> have to be formatted. During development some of the elements defined in a schema are likely to be omitted by accidence. For this reason the <quote>typical</quote> <abbrev xlink:href="http://www.w3.org/TR/xpath">XPath</abbrev> expression acting on mixed content models is defined to match <emphasis>any</emphasis> sub element nodes:</para> <programlisting language="none">select="text()|<emphasis role="bold">*</emphasis>"</programlisting> </listitem> <listitem> <para>Regarding <code>select="text()|emphasis|url"</code> we have defined two templates for element nodes <tag class="starttag">emphasis</tag> and <tag class="starttag">url</tag>. What happens to those text nodes being matched by <code>text()</code>? These are subject to a default rule: The content of bare text nodes is written to the output. We may however redefine this default rule by adding a template:</para> <programlisting language="none"><xsl:template match="text()"> <emphasis role="bold"><span style="color:red"> <xsl:value-of select="."/> </span></emphasis> </xsl:template></programlisting> <para>This yields:</para> <programlisting language="none"><p> <emphasis role="bold"><span style="color:red">The </span></emphasis> <a href="http://w3.org/XML">XML</a> <emphasis role="bold"><span style="color:red"> language is </span></emphasis> <em>easy</em> <emphasis role="bold"><span style="color:red"> to learn. However you need some </span></emphasis> <em>time</em> <emphasis role="bold"><span style="color:red">.</span></emphasis> </p></programlisting> <para>In most cases it is not desired to replace all text nodes throughout the whole document. In the current example we might only format text nodes being <emphasis>immediate</emphasis> children of <tag class="starttag">content</tag>. This may be achieved by restricting the <abbrev xlink:href="http://www.w3.org/TR/xpath">XPath</abbrev> expression to <tag class="starttag">xsl:template match="content/text()"</tag>.</para> </listitem> </orderedlist> </section> <section xml:id="section_xsl_functionid"> <title>The function <code>id()</code></title> <para>In <abbrev xlink:href="http://www.w3.org/Style/XSL">XSL</abbrev> we sometimes want to lookup nodes by an attribute value of type <link xlink:href="???">ID</link>. We consider our product catalog from <xref linkend="sectSchemaProductCatalog"/>. The following <abbrev xlink:href="http://www.w3.org/Style/XSL">XSL</abbrev> may be used to create <acronym>XHtml</acronym>l documents from <tag class="starttag">catalog</tag> instances:</para> <programlisting language="none" xml:lang=""><xsl:template match="/catalog"> <html> <head><title>Product catalog</title></head> <body> <h1>List of Products</h1> <xsl:apply-templates select="product"/> </body> </html> </xsl:template> <xsl:template match="product"> <h2 id="{@id}" <co xml:base="" xml:id="programlisting_catalog2html_v1_defid"/>><xsl:value-of select="title"/></h2> <xsl:apply-templates select="para"/> </xsl:template> <xsl:template match="para"> <p><xsl:apply-templates select="text()|*" <co xml:id="programlisting_catalog2html_v1_mixed"/>/></p> </xsl:template> <xsl:template match="link"> <a href="#{@ref}" <co xml:id="programlisting_catalog2html_v1_refid"/>><xsl:value-of select="."/></a> </xsl:template></programlisting> <calloutlist> <callout arearefs="programlisting_catalog2html_v1_defid"> <para>The <code>ID</code> attribute <tag class="starttag">product id="foo"</tag> is unique within the document instance. We may thus use it as an unique string value in the generated Xhtml, too.</para> </callout> <callout arearefs="programlisting_catalog2html_v1_mixed"> <para>Mixed content consisting of text and <tag class="starttag">link</tag> nodes.</para> </callout> <callout arearefs="programlisting_catalog2html_v1_refid"> <para>We define a file local Xhtml reference to a product.</para> </callout> </calloutlist> <para>The <tag class="starttag">para</tag> element from the example document instance containing a <tag class="starttag">link ref="homeTrainer"</tag> reference will be formatted as:</para> <programlisting language="none"><p>If you hate rain look <a href="#homeTrainer">here</a>.</p></programlisting> <para>Now suppose we want to add the product's title <emphasis>Home trainer</emphasis> here to give the reader an idea about the product without clicking the hypertext link:</para> <programlisting language="none"><p>If you hate rain look <a href="#homeTrainer">here</a> <emphasis role="bold">(Home trainer)</emphasis>.</p></programlisting> <para>This title text node is part of the <tag class="starttag">product</tag>node being referenced from the current <tag class="starttag">para</tag>:</para> <figure xml:id="linkIdrefProduct"> <title>A graphical representation of our <tag class="starttag">catalog</tag>.</title> <mediaobject> <imageobject> <imagedata fileref="Ref/Fig/xsl_id.fig"/> </imageobject> <caption> <para>The dashed line shows the <code>IDREF</code> based reference from the <tag class="starttag">link</tag> to the <tag class="starttag">product</tag> node.</para> </caption> </mediaobject> </figure> <para>In <abbrev xlink:href="http://www.w3.org/Style/XSL">XSL</abbrev> we may follow <code>ID</code> reference by means of the built in function <link xlink:href="http://www.w3.org/TR/xpath#function-id">id(...)</link>:</para> <programlisting language="none"><xsl:template match="link"> <a href="#{@ref}"><xsl:value-of select="."/></a> <xsl:text> (</xsl:text> <xsl:value-of select="<emphasis role="bold">id(@ref)</emphasis>/title" <co xml:id="programlisting_xsl_id_follow"/>/> <xsl:text>)</xsl:text> </xsl:template></programlisting> <para>Evaluating <code>id(@ref)</code> at <xref linkend="programlisting_xsl_id_follow"/> returns the first <tag class="starttag">product</tag> <emphasis>node</emphasis>. We simply take its <tag class="starttag">title</tag> value and embed it into a pair of braces. This way the desired text portion <emphasis role="bold">(Home trainer)</emphasis> gets added after the hypertext link.</para> <qandaset defaultlabel="qanda" xml:id="example_book_xsl_mixed"> <title>Extending the memo style sheet by mixed content and itemized lists</title> <qandadiv> <qandaentry> <question> <para>In <xref linkend="example_book.dtd_v5"/> we constructed a schema allowing itemized lists and mixed content for <tag class="starttag">book</tag> instances. This schema also allowed to define <tag class="starttag">emphasis</tag>, <tag class="starttag">table</tag> and <tag class="starttag">link</tag> elements being part of a mixed content definition. Extend the current book2html.xsl to account for these extensions.</para> <para xlink:href="http://www.w3.org/TR/xslt20/#element-copy-of">As we already saw in our memo example itemized lists in Xhtml are represented by the element <tag class="starttag">ul</tag> containing <tag class="starttag">li</tag> elements. Since <tag class="starttag">p</tag> elements are also allowed to appear as children our itemized lists can be easily mapped to Xhtml tags. A<tag class="starttag">link</tag> node may be transformed into <tag class="starttag">a href="..."</tag> Xhtml node.</para> <para>The table model is a simplified version of the Xhtml table model. Read the <abbrev xlink:href="http://www.w3.org/Style/XSL">XSL</abbrev> documentation of the element <tag class="emptytag">xsl:copy-of</tag> at <link xlink:href="http://www.w3.org/TR/xslt20/#element-copy-of">copy-of</link> for processing tables.</para> </question> <answer> <para>The full source code of the solution is available at <link xlink:href="Ref/src/Dtd/book/v5/book2html.1.xsl">(Online HTML version) ... book2html.1.xsl</link>. We discuss some important aspects. The following table provides mapping rules from <filename>book.xsd</filename> to Xhtml:</para> <table xml:id="table_book2xhtml_element_mappings"> <title>Mapping elements from <filename>book.xsd</filename> to Xhtml</title> <?dbhtml table-width="50%" ?> <?dbfo table-width="50%" ?> <tgroup cols="2"> <colspec colwidth="3*"/> <colspec colwidth="2*"/> <thead> <row> <entry>book.xsd</entry> <entry>Xhtml</entry> </row> </thead> <tbody> <row> <entry><tag class="starttag">book</tag>/<tag class="starttag">title</tag></entry> <entry><tag class="starttag">h1</tag></entry> </row> <row> <entry><tag class="starttag">chapter</tag>/<tag class="starttag">title</tag></entry> <entry><tag class="starttag">h2</tag></entry> </row> <row> <entry><tag class="starttag">para</tag> (mixed content)</entry> <entry><tag class="starttag">p</tag></entry> </row> <row> <entry><tag class="starttag">link href="foo"</tag></entry> <entry><tag class="starttag">a href="foo"</tag></entry> </row> <row> <entry><tag class="starttag">emphasis</tag></entry> <entry><tag class="starttag">em</tag></entry> </row> <row> <entry><tag class="starttag">itemizedlist</tag></entry> <entry><tag class="starttag">ul</tag></entry> </row> <row> <entry><tag class="starttag">listitem</tag></entry> <entry><tag class="starttag">li</tag></entry> </row> <row> <entry><tag class="starttag">table</tag>, <tag class="starttag">caption</tag>,<tag class="starttag">tr</tag>, <tag class="starttag">td</tag> along with all attributes</entry> <entry>Identity copy</entry> </row> </tbody> </tgroup> </table> <para>Since our table model is a subset of the HTML table model we may simply copy corresponding nodes to the output:</para> <programlisting language="none"><xsl:template match="table"> <xsl:copy-of select="."/> </xsl:template></programlisting> <para>Next we need rules for itemized lists and paragraphs. Our model already implements lists in a way that closely resembles XHTML lists. Since the structure are compatible we only have to provide a mapping:</para> <programlisting language="none"><xsl:template match="para"> <p id="{generate-id(.)}"><xsl:apply-templates select="text()|*" /></p> </xsl:template> <xsl:template match="itemizedlist"> <ul><xsl:apply-templates select="listitem"/></ul> </xsl:template> <xsl:template match="listitem"> <li><xsl:apply-templates select="*"/></li> </xsl:template></programlisting> <para>Since <emphasis>all</emphasis> chapters are reachable via hypertext links from the table of contents we <emphasis>must</emphasis> supply a unique <code>id</code> value <xref linkend="programlisting_book2html_single_chapterid"/> for <emphasis>all</emphasis> of them. Chapters and paragraphs may be referenced by <tag class="starttag">link</tag> elements and thus <emphasis>both</emphasis> need a unique identity value. For simplicity we create both of them via <code>generate-id()</code>. In a more sophisticated solution the strategy would be slightly different:</para> <itemizedlist> <listitem> <para>If a <tag class="starttag">chapter</tag> node does have an <code>id</code> attribute defined then take its value.</para> </listitem> <listitem> <para>If a <tag class="starttag">chapter</tag> node does <emphasis>not</emphasis> have an <code>id</code> attribute defined then use <code>generate-id()</code>.</para> </listitem> <listitem> <para><tag class="starttag">para</tag> nodes only get values in XHTML if they do have an <code>id</code> attribute defined. This is consistent since these nodes are never referenced from the table of contents. Thus an identity is only required if the <tag class="starttag">para</tag> node is referenced by a <tag class="starttag">link</tag>. If that is a case the <tag class="starttag">para</tag> surely does have a defined identity value.</para> </listitem> </itemizedlist> <para>We also have to provide a hypertext link <xref linkend="programlisting_book2html_single_toclink"/> to the table of contents:</para> <programlisting language="none"><xsl:template match="chapter"> <h2 id="{<emphasis role="bold">generate-id(.)</emphasis>}" <co xml:base="" xml:id="programlisting_book2html_single_chapterid"/>> <a href="#{<emphasis role="bold">generate-id(/book)</emphasis>}" <co xml:base="" xml:id="programlisting_book2html_single_toclink"/>><xsl:value-of select="title"/></a> </h2> <xsl:apply-templates select="para|itemizedlist|table"/> </xsl:template></programlisting> <para>Implementing the <tag class="starttag">link</tag> element is somewhat more complicated. We cannot use the <code>@ref</code> attribute values itself as <tag class="starttag">a href="..."</tag> attribute values since the target's identity string is generated via <code>generate-id()</code>. But we may follow the reference via the <abbrev xlink:href="http://www.w3.org/TR/xpath">XPath</abbrev> <link linkend="section_xsl_functionid">id()</link> function and then use the target's identity value:</para> <programlisting language="none"><xsl:template match="link"> <a href="#{generate-id(id(@linkend))}"> <xsl:value-of select="."/> </a> </xsl:template></programlisting> <para>The call to <code>id(@linkend)</code> returns either a <tag class="starttag">chapter</tag> or a <tag class="starttag">para</tag> node since attributes of type <code>ID</code> are only defined for these two elements. Using this node as input to <code>generate-id()</code> returns the desired identity value to be used in the generated Xhtml.</para> </answer> </qandaentry> </qandadiv> </qandaset> </section> <section xml:id="xslAxis"> <title>XSL axis definitions</title> <para>XSL allows us to traverse a document instance's graph in different directions. We start with a memo document instance:</para> <programlisting language="none"><memo xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:noNamespaceSchemaLocation="memo.xsd" date="9.9.2099"> <from>Joe</from> <to>Jack</to> <to>Eve</to> <to>Jude</to> <to>Tolstoi</to> <subject>Ignore me!</subject> <content> <para>Dumb text.</para> </content> </memo></programlisting> <para>This instance defines four nodes of type <tag class="starttag">to</tag>. For each of these we want to create a line of text showing also the preceding and the following recipients:</para> <programlisting language="none"> <----Jack----> Eve Jude Tolstoi <co xml:id="programlisting_axis_jack"/> Jack <----Eve----> Jude Tolstoi <co xml:id="programlisting_axis_eve"/> Jack Eve <----Jude----> Tolstoi <co xml:id="programlisting_axis_jude"/> Jack Eve Jude <----Tolstoi----> <co xml:id="programlisting_axis_tolstoi"/></programlisting> <calloutlist> <callout arearefs="programlisting_axis_jack"> <para>Jack has no predecessor and 3 successors</para> </callout> <callout arearefs="programlisting_axis_eve"> <para>Eve has 1 predecessor and 2 successors</para> </callout> <callout arearefs="programlisting_axis_jude"> <para>Jude has 2 predecessors and 1 successor</para> </callout> <callout arearefs="programlisting_axis_tolstoi"> <para><personname>Tolstoi</personname> has 3 predecessors and no successor</para> </callout> </calloutlist> <para>XSL supports this type of transformation by supplying <acronym xlink:href="http://www.w3.org/TR/xpath">XPath</acronym> axis definitions. We consider a memo document with 9 <tag class="starttag">to</tag> nodes:</para> <figure xml:id="memo9recipients"> <title>A memo with 9 recipients</title> <mediaobject> <imageobject> <imagedata fileref="Ref/Fig/memofour.fig"/> </imageobject> </mediaobject> </figure> <para>We marked the 4-th recipient to represent the context node. All three <tag class="starttag">to</tag> nodes to the <quote>left</quote> belong to the <emphasis>set</emphasis> of preceding siblings with respect to the context node. Likewise the 5 neighbours to the right are called following siblings. Returning to our <quote>four recipient</quote> example we may create the desired output by:</para> <programlisting language="none"><xsl:template match="/"> <xsl:apply-templates select="memo/to"/> </xsl:template> <xsl:template match="to"> <xsl:for-each select="preceding-sibling::to" <co xml:id="programlisting_memo_four_xsl_preceding"/>> <xsl:value-of select="."/> <xsl:text> </xsl:text> </xsl:for-each> <xsl:text> &lt;----</xsl:text> <xsl:value-of select="."/> <co xml:id="programlisting_memo_four_xsl_context"/> <xsl:text>----&gt; </xsl:text> <xsl:for-each select="following-sibling::to"> <co xml:id="programlisting_memo_four_xsl_following"/> <xsl:value-of select="."/> <xsl:text> </xsl:text> </xsl:for-each> <xsl:value-of select="$newline"/> </xsl:template></programlisting> <calloutlist> <callout arearefs="programlisting_memo_four_xsl_preceding"> <para>Iterate on the set of recipients <quote>left</quote> of the context node.</para> </callout> <callout arearefs="programlisting_memo_four_xsl_context"> <para>Taking the context node's value embedded in <code><---- ... ----></code>.</para> </callout> <callout arearefs="programlisting_memo_four_xsl_following"> <para>Iterate on the set of recipients <quote>right</quote> of the context node.</para> </callout> </calloutlist> <para>More formally the set of preceding siblings is defined to be the set of all nodes having the same parent as the context node and appearing <quote>before</quote> the context node. The notion <quote>before</quote> is meant in the sense of a <link xlink:href="http://en.wikipedia.org/wiki/Depth-first_search">depth-first</link> traversal of the document tree. <abbrev xlink:href="http://www.w3.org/TR/xpath">XPath</abbrev> provides different axis definitions, see <uri xlink:href="http://www.w3.org/TR/xpath#axes">http://www.w3.org/TR/xpath#axes</uri> for details. We provide an illustration here:</para> <figure xml:id="disjointAxeSets"> <title>Disjoint <acronym xlink:href="http://www.w3.org/TR/xpath">XPath</acronym> axis definitions.</title> <mediaobject> <imageobject> <imagedata fileref="Ref/Fig/preceding.fig"/> </imageobject> <caption> <para>The sets defined by ancestor, descendant, following, preceding and self are disjoint. Their union forms the set of all document nodes.</para> </caption> </mediaobject> </figure> <para>Some remarks:<itemizedlist> <listitem> <para>If the context node is already the topmost node i.e. the root node then the sets defined by <code>ancestor</code> and <code>parent</code> are empty.</para> </listitem> <listitem> <para>The <code>parent</code> set <emphasis>always</emphasis> contains zero or one node.</para> </listitem> </itemizedlist></para> </section> <section xml:id="xslChunking"> <title>Splitting documents into chunks</title> <para>Sometimes we want to generate multiple output documents from a single XML source. It may for example be a bad idea to transform a book of 200 printed pages into a <emphasis>single</emphasis> online HTML page. Instead we may split each chapter into a separate HTML file and create navigation links between them.</para> <para>We consider a memo document instance. We want to generate one text file for each memo recipient containing just the recipient's name using the <abbrev xlink:href="http://www.w3.org/Style/XSL">XSL</abbrev> element <link xlink:href="http://www.w3.org/TR/xslt20/#element-result-document"><xsl:result-document></link>:</para> <programlisting language="none"><xsl:template match="/memo"> <xsl:apply-templates select="to"/> </xsl:template> <xsl:template match="to"> <emphasis role="bold"><xsl:result-document</emphasis> <co xml:id="programlisting_xsl_result_document_main"/> <emphasis role="bold">href="file_{position()}.txt"</emphasis> <co xml:id="programlisting_xsl_result_document_href"/> <emphasis role="bold">method="text"</emphasis> <co xml:id="programlisting_xsl_result_document_method"/>> <xsl:value-of select="."/> <co xml:id="programlisting_xsl_result_document_content"/> <emphasis role="bold"></xsl:result-document></emphasis> </xsl:template></programlisting> <calloutlist> <callout arearefs="programlisting_xsl_result_document_main"> <para>The output from all generating <abbrev xlink:href="http://www.w3.org/Style/XSL">XSL</abbrev> directives will be redirected from standard output to another output channel.</para> </callout> <callout arearefs="programlisting_xsl_result_document_href"> <para>The output will be written to a file named <filename>file_i.txt</filename> with decimal number <code>i</code> ranging from value 1 up to the number of recipients.</para> </callout> <callout arearefs="programlisting_xsl_result_document_method"> <para>The <code>method</code> attribute possibly overrides a value being given in the <tag class="starttag">xsl:output</tag> element. We may also redefine <link xlink:href="http://www.w3.org/TR/xslt20/#element-result-document">other attributes</link> from <tag class="starttag">xsl:output</tag> like <code>doctype-{public.system}</code> and the generated file's <code>encoding</code>.</para> </callout> <callout arearefs="programlisting_xsl_result_document_content"> <para>All output being generated in this region gets redirected to the channel specified in <xref linkend="programlisting_xsl_result_document_href"/>.</para> </callout> </calloutlist> <qandaset defaultlabel="qanda" xml:id="example_book_chunk"> <title>Splitting book into chapter files</title> <qandadiv> <qandaentry> <question> <para>Extend your solution of <xref linkend="example_book_xsl_mixed"/> by writing each <tag class="starttag">chapter</tag>'s content into a separate Xhtml file. In addition create a file <filename>index.html</filename> which contains references to the corresponding <tag class="starttag">chapter</tag> documents. Thus for a document instance with two chapters the overall navigation structure is illustrated by <xref linkend="figure_book_navigation"/>.</para> <para>Implementing the <tag class="starttag">link</tag> tag may cause a problem: An internal link may reference a <tag class="starttag">para</tag>. You need to identify the <tag class="starttag">chapter</tag> node embedding this para. This may be done by using a suitable <abbrev xlink:href="http://www.w3.org/TR/xpath">XPath</abbrev> axis direction.</para> </question> <answer> <para>The full source code of the solution is available at <link xlink:href="Ref/src/Dtd/book/v5/book2chunks.1.xsl">(Online HTML version) ... book2chunks.1.xsl</link>. First we generate the table of contents file <filename>index.html</filename>:</para> <programlisting language="none"><xsl:template match="/"> <xsl:result-document href="index.html"> <xsl:apply-templates select="book"/> </xsl:result-document> <xsl:for-each select="book/chapter"> <xsl:result-document href="{generate-id(.)}.html"> <xsl:apply-templates select="."/> </xsl:result-document> </xsl:for-each> </xsl:template> <xsl:template match="book"> <html> <head><title><xsl:value-of select="title"/></title></head> <body> <h1><xsl:value-of select="title"/></h1> <h2>Table of contents</h2> <ul> <xsl:for-each select="<emphasis role="bold">chapter</emphasis>"> <li><a href="{<emphasis role="bold">generate-id(.)</emphasis>}.html"><xsl:value-of select="title"/></a></li> </xsl:for-each> </ul> </body> </html> </xsl:template></programlisting> <para>The <tag class="starttag">link ref="..."</tag> may reference a <tag class="starttag">chapter</tag> or a <tag class="starttag">para</tag>. So we may need to <quote>step up</quote> from a paragraph to the corresponding chapter node:</para> <programlisting language="none"><xsl:template match="link"> <xsl:variable name="reftargetNode" select="id(@linkend)"/> <xsl:variable name="reftargetParentChapter" select="$reftargetNode/ancestor-or-self::chapter"/> <a href="{generate-id($reftargetParentChapter)}.html#{ generate-id($reftargetNode)}"> <xsl:value-of select="."/> </a> </xsl:template></programlisting> <para>This is consistent since <emphasis>all</emphasis> <tag class="starttag">p</tag> nodes in the generated Xhtml receive a unique <code>id</code> value regardless whether the originating <tag class="starttag">para</tag> node does have one.</para> </answer> </qandaentry> </qandadiv> </qandaset> <figure xml:id="figure_book_navigation"> <title>A <tag class="starttag">book</tag> document with two chapters</title> <mediaobject> <imageobject> <imagedata fileref="Ref/Fig/booknavigate.fig"/> </imageobject> </mediaobject> </figure> </section> </section> </section> </chapter> <chapter xml:id="xmlApis"> <title><abbrev xlink:href="http://en.wikipedia.org/wiki/Api">API</abbrev>s for XML document processing</title> <section xml:id="sax"> <title>The Simple API for XML</title> <section xml:id="saxPrinciple"> <title>The principle of a <acronym xlink:href="http://www.saxproject.org">SAX</acronym> application</title> <para>We are already familiar with transformations of XML document instances to other formats. Sometimes the capabilities being offered by a given transformation approach do not suffice for a given problem. Obviously a general purpose programming language like <link linkend="gloss_Java"><trademark>Java</trademark></link> offers superior means to perform advanced manipulations of XML document trees.</para> <para>Before diving into technical details we present an example exceeding the limits of our present transformation capabilities. We want to format an XML catalog document with article descriptions to HTML. The price information however shall resides in a XML document external database namely a RDBMS:</para> <figure xml:id="saxRdbmsAccessPrinciple"> <title>Generating HTML from a XML document and an RDBMS.</title> <mediaobject> <imageobject> <imagedata fileref="Ref/Fig/saxxmlrdbms.fig" scale="65"/> </imageobject> </mediaobject> </figure> <para>Our catalog might look like:</para> <figure xml:id="simpleCatalog"> <title>A <link linkend="gloss_XML"><abbrev>XML</abbrev></link> based catalog.</title> <programlisting language="none"><catalog> <item orderNo="<emphasis role="bold">3218</emphasis>">Swinging headset</item> <item orderNo="<emphasis role="bold">9921</emphasis>">200W Stereo Amplifier</item> </catalog></programlisting> </figure> <para>The RDBMS may hold some relation with a field <code>orderNo</code> as primary key and a corresponding attribute like <code>price</code>. In a real world application <code>orderNo</code> should probably be an integer typed <code>IDENTITY</code> attribute.</para> <figure xml:id="saxRdbmsSchema"> <title>A Relation containing price information.</title> <programlisting language="none">CREATE TABLE Product ( orderNo CHAR(10) PRIMARY KEY ,price Money ) INSERT INTO Product VALUES('<emphasis role="bold">3218</emphasis>', 42.57) INSERT INTO Product VALUES('<emphasis role="bold">9921</emphasis>', 121.50)</programlisting> <caption> <para>Prices are depending on article numbers.</para> </caption> </figure> <para>The intended HTML output with order numbers being highlighted looks like:</para> <figure xml:id="saxPriceOut"> <title>HTML generated output.</title> <programlisting language="none"><!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"> <html> <head><title>Available products</title></head> <body> <table border="1"> <tbody> <tr> <th><emphasis role="bold">Order number</emphasis></th> <th>Price</th> <th>Product</th> </tr> <tr> <td><emphasis role="bold">3218</emphasis></td> <td>42,57</td> <td>Swinging headset</td> </tr> <tr> <td><emphasis role="bold">9921</emphasis></td> <td>121,50</td> <td>200W Stereo Amplifier</td> </tr> </tbody> </table> </body> </html></programlisting> <caption> <para>This result HTML document contains content both from our XML document an from the database table <code>Product</code>.</para> </caption> </figure> <para>The intended transformation is beyond the XSLT standard's processing capabilities: XSLT does not enable us to RDBMS content. However some XSLT processors provide extensions for this task.</para> <para>It is tempting to write a <link linkend="gloss_Java"><trademark>Java</trademark></link> application which might use e.g. <trademark xlink:href="http://electronics.zibb.com/trademark/jdbc/29545026">JDBC</trademark> for database access. But how do we actually read and parse a XML file? Sticking to the <link linkend="gloss_Java"><trademark>Java</trademark></link> standard we might use a <link xlink:href="http://java.sun.com/j2se/1.5.0/docs/api/java/io/FileInputStream.html">FileInputStream</link> instance to read from <code>catalog.xml</code> and write a XML parser by ourself. Fortunately <orgname>SUN</orgname>'s <trademark xlink:href="http://www.oracle.com/technetwork/java/javase">JDK</trademark> already includes an API denoted <acronym xlink:href="http://www.saxproject.org">SAX</acronym>, the <emphasis>S</emphasis>imple <emphasis>A</emphasis>pi for <emphasis>X</emphasis>ml. The<productname xlink:href="http://www.oracle.com/technetwork/java/javase/jdk-7-readme-429198.html">JDK</productname> also includes a corresponding parser implementation. In addition there are third party <acronym xlink:href="http://www.saxproject.org">SAX</acronym> parser implementations available like <productname xlink:href="http://xerces.apache.org">Xerces</productname> from the <orgname xlink:href="http://www.apache.org">Apache Foundation</orgname>.</para> <para>The <acronym xlink:href="http://www.saxproject.org">SAX</acronym> API is event based and will be illustrated by the relationship between customers and a software vendor company:</para> <mediaobject> <imageobject> <imagedata fileref="Ref/Fig/updateinfo.fig"/> </imageobject> </mediaobject> <para>After purchasing software customers are asked to register their software. This way the vendor receives the customer's address. Each time a new release is being completed all registered customers will receive a notification typically including a <quote>special offer</quote> to upgrade their software. From an abstract point of view the following two actions take place:</para> <variablelist> <varlistentry> <term>Registration</term> <listitem> <para>The customer registers itself at the company's site indicating it's interest in updated versions.</para> </listitem> </varlistentry> <varlistentry> <term>Notification</term> <listitem> <para>Upon completion of each new software release (considered to be an <emphasis>event</emphasis>) a message is sent to all registered customers.</para> </listitem> </varlistentry> </variablelist> <para>The same principle applies to GUI applications in software development. A key press <emphasis>event</emphasis> for example will be forwarded by an application's <emphasis>event handler</emphasis> to a callback function (sometimes called a <emphasis>handler</emphasis> method) being implemented by an application developer. The <acronym xlink:href="http://www.saxproject.org">SAX</acronym> API works the same way: A parser reads a XML document generating events which <emphasis>may</emphasis> be handled by an application. During document parsing the XML tree structure gets <quote>flattened</quote> to a sequence of events:</para> <figure xml:id="saxFlattenEvent"> <title>Parsing a XML document creates a corresponding sequence of events.</title> <mediaobject> <imageobject> <imagedata fileref="Ref/Fig/saxmodel.pdf"/> </imageobject> </mediaobject> </figure> <para>An application may register components to the parser:</para> <figure xml:id="figureSax"> <title><acronym xlink:href="http://www.saxproject.org">SAX</acronym> Principle</title> <mediaobject> <imageobject> <imagedata fileref="Ref/Fig/saxapparch.pdf"/> </imageobject> <caption> <para>A <acronym xlink:href="http://www.saxproject.org">SAX</acronym> application consists of a <acronym xlink:href="http://www.saxproject.org">SAX</acronym> parser and an implementation of event handlers being specific to the application. The application is developed by implementing the two handlers.</para> </caption> </mediaobject> </figure> <para>An Error Handler is required since the XML stream may contain errors. In order to implement a <acronym xlink:href="http://www.saxproject.org">SAX</acronym> application we have to:</para> <orderedlist> <listitem> <para>Instantiate required objects:</para> <itemizedlist> <listitem> <para>Parser</para> </listitem> <listitem> <para>Event Handler</para> </listitem> <listitem> <para>Error Handler</para> </listitem> </itemizedlist> </listitem> <listitem> <para>Register handler instances</para> <itemizedlist> <listitem> <para>register Event Handler to Parser</para> </listitem> <listitem> <para>register Error Handler to Parser</para> </listitem> </itemizedlist> </listitem> <listitem> <para>Start the parsing process by calling the parser's appropriate method.</para> </listitem> </orderedlist> </section> <section xml:id="saxIntroExample"> <title>First steps</title> <para>Our first <acronym xlink:href="http://www.saxproject.org">SAX</acronym> toy application <classname>sax.stat.v1.ElementCount</classname> shall simply count the number of elements it finds in an arbitrary XML document. In addition the <acronym xlink:href="http://www.saxproject.org">SAX</acronym> events shall be written to standard output generating output sketched in <xref linkend="saxFlattenEvent"/>. The application's central implementation reads:</para> <figure xml:id="saxElementCount"> <title>Counting XML elements.</title> <programlisting language="none">package sax.stat.v1; ... public class ElementCount { public void parse(final String uri) { try { final SAXParserFactory saxPf = SAXParserFactory.newInstance(); final SAXParser saxParser = saxPf.newSAXParser(); saxParser.parse(uri, eventHandler); } catch (ParserConfigurationException e){ e.printStackTrace(System.err); } catch (org.xml.sax.SAXException e) { e.printStackTrace(System.err); } catch (IOException e){ e.printStackTrace(System.err); } } public int getElementCount() { return eventHandler.getElementCount(); } private final MyEventHandler eventHandler = new MyEventHandler(); }</programlisting> <caption> <para>This application works for arbitrary well-formed XML documents.</para> </caption> </figure> <para>We now explain this application in detail. The first part deals with the instantiation of a parser:</para> <programlisting language="none">try { final SAXParserFactory saxPf = <emphasis role="bold">SAXParserFactory</emphasis>.newInstance(); final SAXParser saxParser = saxPf.newSAXParser(); saxParser.parse(uri, eventHandler); } catch (ParserConfigurationException e){ e.printStackTrace(System.err); } ...</programlisting> <para>In order to keep an application independent from a specific parser implementation the <acronym xlink:href="http://www.saxproject.org">SAX</acronym> uses the so called <link xlink:href="http://www.dofactory.com/Patterns/PatternAbstract.aspx">Abstract Factory Pattern</link> instead of simply calling a constructor from a vendor specific parser class.</para> <para>In order to be useful the parser has to be instructed to do something meaningful when a XML document gets parsed. For this purpose our application supplies an event handler instance:</para> <programlisting language="none">public void parse(final String uri) { try { final SAXParserFactory saxPf = SAXParserFactory.newInstance(); final SAXParser saxParser = saxPf.newSAXParser(); saxParser.parse(uri, <emphasis role="bold">eventHandler</emphasis>); } catch (org.xml.sax.SAXException e) { ... private final MyEventHandler <emphasis role="bold">eventHandler = new MyEventHandler()</emphasis>; }</programlisting> <para>What does the event handler actually do? It offers methods to the parser being callable during the parsing process:</para> <programlisting language="none">package sax.stat.v1; ... public class MyEventHandler extends <classname>org.xml.sax.helpers.DefaultHandler</classname> { public void <emphasis role="bold"><emphasis role="bold">startDocument()</emphasis></emphasis><co xml:id="programlisting_eventhandler_startDocument"/> { System.out.println("Opening Document"); } public void <emphasis role="bold">endDocument()</emphasis><co xml:id="programlisting_eventhandler_endDocument"/> { System.out.println("Closing Document"); } public void <emphasis role="bold">startElement(String namespaceUri, String localName, String rawName, Attributes attrs)</emphasis> <co xml:id="programlisting_eventhandler_startElement"/>{ System.out.println("Opening \"" + rawName + "\""); elementCount++; } public void <emphasis role="bold">endElement(String namespaceUri, String localName, String rawName)</emphasis><co xml:id="programlisting_eventhandler_endElement"/>{ System.out.println("Closing \"" + rawName + "\""); } public void <emphasis role="bold">characters(char[] ch, int start, int length)</emphasis><co xml:id="programlisting_eventhandler_characters"/>{ System.out.println("Content \"" + new String(ch, start, length) + '"'); } public int getElementCount() <co xml:id="programlisting_eventhandler_getElementCount"/>{ return elementCount; } private int elementCount = 0; }</programlisting> <calloutlist> <callout arearefs="programlisting_eventhandler_startDocument"> <para>This method gets called exactly once namely when opening the XML document as a whole.</para> </callout> <callout arearefs="programlisting_eventhandler_endDocument"> <para>After successfully parsing the whole document instance this method will finally be called.</para> </callout> <callout arearefs="programlisting_eventhandler_startElement"> <para>This method gets called each time a new element is parsed. In the given catalog.xml example it will be called three times: First when the <tag class="starttag">catalog</tag> appears and then two times upon each <item ... >. The supplied parameters depend whether or not name space processing is enabled.</para> </callout> <callout arearefs="programlisting_eventhandler_endElement"> <para>Called each time an element like <tag class="starttag">item ...</tag> gets closed by its counterpart <tag class="endtag">item</tag>.</para> </callout> <callout arearefs="programlisting_eventhandler_characters"> <para>This method is responsible for the treatment of textual content i.e. handling <code>#PCDATA</code> element content. We will explain its uncommon signature a little bit later.</para> </callout> <callout arearefs="programlisting_eventhandler_getElementCount"> <para><function>getElementCount()</function> is a getter method to read only access the private field <varname>elementCount</varname> which gets incremented in <coref linkend="programlisting_eventhandler_startElement"/> each time an XML element opens.</para> </callout> </calloutlist> <para>The call <code>saxParser.parse(uri, eventHandler)</code> actually initiates the parsing process and tells the parser to:</para> <itemizedlist> <listitem> <para>Open the XML document being referenced by the URI argument.</para> </listitem> <listitem> <para>Forward XML events to the event handler instance supplied by the second argument.</para> </listitem> </itemizedlist> <para>A driver class containing a <code>main(...)</code> method may start the whole process and print out the desired number of elements upon completion of a parsing run:</para> <programlisting language="none">package sax.stat.v1; public class ElementCountDriver { public static void main(String argv[]) { ElementCount xmlStats = new ElementCount(); xmlStats.parse("<emphasis role="bold">Input/Sax/catalog.xml</emphasis>"); System.out.println("Document contains " + xmlStats.<emphasis role="bold">getElementCount()</emphasis> + " elements"); } }</programlisting> <para>Processing the catalog example instance yields:</para> <programlisting language="none">Opening Document <emphasis role="bold">Opening "catalog"</emphasis> <co xml:id="programlisting_catalog_output"/> Content " " <emphasis role="bold">Opening "item"</emphasis> <co xml:id="programlisting_catalog_item1"/> Content "Swinging headset" Closing "item" Content " " <emphasis role="bold">Opening "item"</emphasis> <co xml:id="programlisting_catalog_item2"/> Content "200W Stereo Amplifier" Closing "item" Content " " Closing "catalog" Closing Document <emphasis role="bold">Document contains 3 elements</emphasis> <co xml:id="programlisting_catalog_elementcount"/></programlisting> <calloutlist> <callout arearefs="programlisting_catalog_output"> <para>Start parsing element <tag class="starttag">catalog</tag>.</para> </callout> <callout arch="" arearefs="programlisting_catalog_item1"> <para>Start parsing element <tag class="starttag">item orderNo="3218"</tag>Swinging headset<tag class="endtag" role="">item</tag>.</para> </callout> <callout arch="" arearefs="programlisting_catalog_item2"> <para>Start parsing element <tag class="starttag">item orderNo="9921"</tag>200W Stereo Amplifier<tag class="endtag" role="">item</tag>.</para> </callout> <callout arearefs="programlisting_catalog_elementcount"> <para>After the parsing process has completed the application outputs the number of elements being counted so far.</para> </callout> </calloutlist> <para>The output contains some lines of <quote>empty</quote> content. This content is due to whitespace being located between elements. For example a newline appears between the the <tag class="starttag">catalog</tag> and the first <tag class="starttag">item</tag> element. The parser encapsulates this whitespace in a call to the <link xlink:href="http://java.sun.com/j2se/1.5.0/docs/api/org/xml/sax/ContentHandler.html#characters(char[],%20int,%20int)">characters</link> method. In an application this call will typically be ignored. XML document instances in a professional context will typically not contain any newline characters at all. Instead the whole document is represented as a single line. This inhibits human readability which is not required if the processing applications work well. In this case empty content as above will not appear.</para> <para>The <code>characters(char[] ch, int start, int length)</code> method's signature looks somewhat strange regarding <link linkend="gloss_Java"><trademark>Java</trademark></link> conventions. One might expect <code>characters(String s)</code>. But this way the <acronym xlink:href="http://www.saxproject.org">SAX</acronym> API allows efficient parser implementations: A parser may initially allocate a reasonable large <code>char</code> array of say 128 bytes sufficient to hold 64 (<link xlink:href="http://unicode.org">Unicode</link>) characters. If this buffer gets exhausted the parser might allocate a second buffer of double size thus implementing an <quote>amortized doubling</quote> algorithm:</para> <mediaobject> <imageobject> <imagedata fileref="Ref/Fig/saxcharacter.pdf"/> </imageobject> </mediaobject> <para>In this example the first element content fits in the first buffer. The second content <code>200W Stereo Amplifier</code> and the third content <code>Earphone</code> both fit in the second buffer. Subsequent content may require further buffer allocations. Such a strategy minimizes the number of time consuming <code>new </code> <link xlink:href="http://java.sun.com/j2se/1.5.0/docs/api/java/lang/String.html">String</link> <code>(...)</code> constructor calls being necessary for the more convenient API variant <code>characters(String s)</code>.</para> </section> <section xml:id="saxRegistry"> <title>Event- and error handler registration</title> <para>Our first <acronym xlink:href="http://www.saxproject.org">SAX</acronym> application suffers from the following deficiencies:</para> <itemizedlist> <listitem> <para>The error handling is very sparse. It completely relies on exceptions being thrown by classes like <link xlink:href="http://java.sun.com/j2se/1.5.0/docs/api/org/xml/sax/SAXException.html">SAXException</link> which frequently do not supply meaningful error information.</para> </listitem> <listitem> <para>The application is not aware of namespaces. Thus reading e.g. <abbrev xlink:href="http://www.w3.org/Style/XSL">XSL</abbrev> document instances will not allow to distinguish between elements from different namespaces like HTML.</para> </listitem> <listitem> <para>The parser will not validate a document instance against a schema being present.</para> </listitem> </itemizedlist> <para>We now incrementally add these features to the <acronym xlink:href="http://www.saxproject.org">SAX</acronym> parsing process. <acronym xlink:href="http://www.saxproject.org">SAX</acronym> offers an interface <link xlink:href="http://java.sun.com/j2se/1.5.0/docs/api/org/xml/sax/XMLReader.html">XmlReader</link> to conveniently <emphasis>register</emphasis> event- and error handler instances independently instead of passing both interfaces as a single argument to the <link xlink:href="http://java.sun.com/j2se/1.5.0/docs/api/javax/xml/parsers/SAXParser.html#parse(java.lang.String,%20org.xml.sax.helpers.DefaultHandler)">parse</link> method. We first code an error handler class by implementing the interface <classname>org.xml.sax.ErrorHandler</classname> being part of the <acronym xlink:href="http://www.saxproject.org">SAX</acronym> API:</para> <programlisting language="none">package sax.stat.v2; ... public class MyErrorHandler implements ErrorHandler { <emphasis role="bold">public void warning(SAXParseException e)</emphasis> { System.err.println("[Warning]" + getLocationString(e)); } <emphasis role="bold">public void error(SAXParseException e)</emphasis> { System.err.println("[Error]" + getLocationString(e)); } <emphasis role="bold">public void fatalError(SAXParseException e)</emphasis> throws SAXException{ System.err.println("[Fatal Error]" + getLocationString(e)); } private String getLocationString(SAXParseException e) { return " line " + e.getLineNumber() + ", column " + e.getColumnNumber()+ ":" + e.getMessage(); } }</programlisting> <para>These three methods represent the <classname>org.xml.sax.ErrorHandler</classname> interface. The method <function>getLocationString</function> is used to supply precise parsing error locations by means of line- and column numbers within a document instance. If errors or warnings are encountered the parser will call one of the appropriate public methods:</para> <figure xml:id="saxMissItem"> <title>A non well formed document.</title> <programlisting language="none"><?xml version="1.0" encoding="UTF-8"?> <catalog> <item orderNo="3218">Swinging headset</item> <item orderNo="9921">200W Stereo Amplifier </catalog></programlisting> <caption> <para>This document is not well formed since due to a missing a closing <tag class="endtag">item</tag> tag is missing.</para> </caption> </figure> <para>Our error handler method gets called yielding an informative message:</para> <programlisting language="none">[Fatal Error] line 5, column -1:Expected "</item>" to terminate element starting on line 4.</programlisting> <para>This error output is achieved by <emphasis>registering</emphasis> an instance of <classname>sax.stat.v2.MyErrorHandler</classname> to the parser prior to starting the parsing process. In the following code snippet we also register a content handler instance to the parser and thus separate the parser's configuration from its invocation:</para> <programlisting language="none">package sax.stat.v2; ... public class ElementCount { public ElementCount() throws SAXException, ParserConfigurationException{ final SAXParserFactory saxPf = SAXParserFactory.newInstance(); final SAXParser saxParser = saxPf.newSAXParser(); xmlReader = saxParser.getXMLReader(); xmlReader.setContentHandler(eventHandler); <co xml:id="programlisting_assemble_parser_setcontenthandler"/> xmlReader.setErrorHandler(errorHandler); <co xml:id="programlisting_assemble_parser_seterrorhandler"/> } public void parse(final String uri) throws IOException, SAXException{ xmlReader.parse(uri); <co xml:id="programlisting_assemble_parser_invokeparse"/> } public int getElementCount() { return eventHandler.getElementCount(); <co xml:id="programlisting_assemble_parser_getelementcount"/> } private final XMLReader xmlReader; private final MyEventHandler eventHandler = new MyEventHandler(); <co xml:id="programlisting_assemble_parser_createeventhandler"/> private final MyErrorHandler errorHandler = new MyErrorHandler(); <co xml:id="programlisting_assemble_parser_createerrorhandler"/> }</programlisting> <calloutlist> <callout arearefs="programlisting_assemble_parser_setcontenthandler programlisting_assemble_parser_seterrorhandler"> <para>Referring to <xref linkend="figureSax" os=""/> these two calls attach the event- and error handler objects to the parser thus implementing the two arrows from the parser to the application's implementation.</para> </callout> <callout arearefs="programlisting_assemble_parser_invokeparse"> <para>The parser is invoked. Note that in this example we only pass a document's URI but no reference to a handler object.</para> </callout> <callout arearefs="programlisting_assemble_parser_getelementcount"> <para>The method <function>getElementCount()</function> is needed to allow a calling object to access the private <varname>eventHandler</varname> object's <function>getElementCount()</function> method.</para> </callout> <callout arearefs="programlisting_assemble_parser_createeventhandler programlisting_assemble_parser_createerrorhandler"> <para>An event handling and an error handling object are created to handle events during the parsing process.</para> </callout> </calloutlist> <para>The careful reader might notice a subtle difference between the content- and the error handler implementation: The class <classname>sax.stat.v2.MyErrorHandler</classname> implements the interface <classname>org.xml.sax.ErrorHandler</classname>. But <classname>sax.stat.v2.MyEventHandler</classname> is derived from <classname>org.xml.sax.helpers.DefaultHandler</classname> which itself implements the <classname>org.xml.sax.ContentHandler</classname> interface. Actually one might as well start from the latter interface requiring to implement all of it's 11 methods. In most circumstances this only complicates the application's code since it is unnecessary to react to events belonging for example to processing instructions. For this reason it is good coding practice to use the empty default implementations in <classname>org.xml.sax.helpers.DefaultHandler</classname> and to redefine only those methods corresponding to events actually being handled by the application in question.</para> <qandaset defaultlabel="qanda" xml:id="sda1SaxReadAttributes"> <title>SAX and attribute values</title> <qandadiv> <qandaentry> <question> <label>Reading an element's set of attributes.</label> <para>The example document instance does include <tag class="attribute">orderNo</tag> attribute values for each <tag class="starttag">item</tag> element. The parser does not yet show these attribute keys and their corresponding values. Read the documentation for <classname xlink:href="http://java.sun.com/j2se/1.5.0/docs/api/org/xml/sax/Attributes.html">org.xml.sax.Attributes</classname> and extend the given code to use it.</para> <para>You should start from the <xref linkend="glo_MIB"/> Maven archetype <code>mi-maven-archetype-sax</code>. Configuration hints are available at <uri xlink:href="http://www.mi.hdm-stuttgart.de/freedocs/topic/de.hdm_stuttgart.mi.swd1/sw1Resources.html">http://www.mi.hdm-stuttgart.de/freedocs/topic/de.hdm_stuttgart.mi.swd1/sw1Resources.html</uri>.</para> </question> <answer> <para>For the given example it would suffice to read the known <tag class="attribute">orderNo</tag> attributes value. A generic solution may ask for the set of all defined attributes and show their values:</para> <programlisting language="none">package sax; public class AttribEventHandler extends DefaultHandler { public void startElement(String namespaceUri, String localName, String rawName, Attributes attrs) { System.out.println("Opening Element " + rawName); for (int i = 0; i < attrs.getLength(); i++){ System.out.println(attrs.getQName(i) + "=" + attrs.getValue(i) + "\n"); } } }</programlisting> </answer> </qandaentry> </qandadiv> </qandaset> <section xml:id="sda1SecElementLists"> <title>The set of element names</title> <qandaset defaultlabel="qanda" xml:id="sda1QandaElementNames"> <title>Element lists of arbitrary XML documents.</title> <qandadiv> <qandaentry> <question> <para>We reconsider the simple application reading arbitrary XML documents and providing a list of XML Elements being contained within:</para> <programlisting language="none">Opening Document <emphasis role="bold">Opening "catalog"</emphasis> Content " " <emphasis role="bold">Opening "item"</emphasis> Content "Swinging headset" Closing "item" Content " ...</programlisting> <para>If an element like e.g. <tag class="starttag">item</tag> appears multiple times it will also be written to standard output multiple times.</para> <para>We are now interested to get the list of all elements names being present in an arbitrary XML document. Consider the following example:</para> <programlisting language="none"><memo> <from> <name>Martin</name> <surname>Goik</surname> </from> <to> <name>Adam</name> <surname>Hacker</surname> </to> <to> <name>Eve</name> <surname>Intruder</surname> </to> <date year="2005" month="1" day="6"/> <subject>Firewall problems</subject> <content> <para>Thanks for your excellent work.</para> <para>Our firewall is definitely broken!</para> </content> </memo></programlisting> <para>The elements <tag class="starttag">to</tag> , <tag class="starttag">name</tag>, <tag class="starttag">surname</tag> and <tag class="starttag">para</tag> both appear multiple times. Write a SAX application which processes arbitrary XML documents and creates an alphabetically sorted list of elements being contained <emphasis role="bold">excluding duplicates</emphasis>. The intended output for the above example is:</para> <programlisting language="none">List of elements: {content date from memo name para subject surname to }</programlisting> <para>The corresponding handler should be implemented in a re-usable way. Thus if different XML documents are being handled in succession the list of elements should be erased prior to processing the current document. Hints:</para> <itemizedlist> <listitem> <para>Use a <classname>java.util.SortedSet</classname> instance to collect element names thereby excluding duplicates.</para> </listitem> <listitem> <para>The method <methodname>sax.count.ListTagNamesHandler.startDocument()</methodname> may be used to initialize your handler.</para> </listitem> </itemizedlist> </question> <answer> <para>A suitable handler reads:</para> <programlisting language="none">package sax.count; import java.util.SortedSet; import java.util.TreeSet; import org.xml.sax.Attributes; import org.xml.sax.SAXException; import org.xml.sax.helpers.DefaultHandler; /** Reading attributes from element events */ public class ListTagNamesHandler extends DefaultHandler { // A SortedSet by definition does not contain any duplicates. private SortedSet<String> elementNames = new TreeSet<>(); @Override public void startDocument() throws SAXException { elementNames.clear(); // May contain elements from a previous run. } public void startElement(String namespaceUri, String localName, String rawName, Attributes attrs) { // In case the current element name has already been inserted // this method call will be silently ignored. elementNames.add(rawName); } /** * @return A sorted list of element names of he currently processed XML * document without duplicates. */ public String[] getTagNames() { return elementNames.toArray(new String[0]); } }</programlisting> <para>A complete application requires a driver:</para> <programlisting language="none">package sax.count; import javax.xml.parsers.SAXParser; import javax.xml.parsers.SAXParserFactory; import org.xml.sax.XMLReader; import sax.stat.v2.MyErrorHandler; public class Driver { public static void main(String argv[]) throws Exception { final SAXParserFactory saxPf = SAXParserFactory.newInstance(); final SAXParser saxParser = saxPf.newSAXParser(); final XMLReader xmlReader = saxParser.getXMLReader(); final ListTagNamesHandler handler = new ListTagNamesHandler(); xmlReader.setContentHandler(handler); xmlReader.setErrorHandler(new MyErrorHandler()); xmlReader.parse("Input/Xml/Memo/message.xml"); System.out.print("List of elements: {"); for (String elementName : handler.getTagNames()) { System.out.print(elementName + " "); } System.out.println("}"); } }</programlisting> </answer> </qandaentry> </qandadiv> </qandaset> </section> <section xml:id="sda1SaxView"> <title>A limited view on a given XML document instance</title> <qandaset defaultlabel="qanda" xml:id="sda1QandamemoView"> <title>A specific view on memo documents</title> <qandadiv> <qandaentry> <question> <para>We reconsider the following memo instance:</para> <programlisting language="none"><memo> <from> <name>Martin</name> <surname>Goik</surname> </from> <to> <name>Adam</name> <surname>Hacker</surname> </to> <to> <name>Eve</name> <surname>Intruder</surname> </to> <date year="2005" month="1" day="6"/> <subject>Firewall problems</subject> <content> <para>Thanks for your excellent work.</para> <para>Our firewall is definitely broken!</para> </content> </memo></programlisting> <para>Every memo instance does have exactly one sender and one subject. Write a SAX application to achieve the following output:</para> <programlisting language="none">Sender: Martin Goik Subject: Firewall problems</programlisting> <para>Hint: The callback implementation of <methodname>org.xml.sax.helpers.DefaultHandler.characters(char[],int,int)</methodname> may be used to filter the desired output. You have to limit its output to <tag class="starttag">from</tag> and <tag class="starttag">subject</tag> descendant content. Taking the <tag class="starttag">subject</tag>Firewall problems<tag class="endtag">subject</tag> element as an example the corresponding event sequence reads:</para> <informaltable border="1"> <tr> <th>Event</th> <th>Corresponding callback</th> </tr> <tr> <td>...</td> <td>...</td> </tr> <tr> <td>Opening <tag class="starttag">subject</tag> element</td> <td>startElement(...)</td> </tr> <tr> <td>Firewall problems</td> <td>characters(...)</td> </tr> <tr> <td>Closing <tag class="endtag">subject</tag> element</td> <td>endElement(...)</td> </tr> <tr> <td>...</td> <td>...</td> </tr> </informaltable> <para>Limiting output of our <methodname>org.xml.sax.helpers.DefaultHandler.characters(char[],int,int)</methodname> callback method can be achieved by introducing instance scope boolean variables being set to true or false inside your <methodname>org.xml.sax.helpers.DefaultHandler.startElement(String uri,String localName,String qName,org.xml.sax.Attributes attributes)</methodname> and <methodname>org.xml.sax.helpers.DefaultHandler.endElement(String uri, String localName, String qName)</methodname> implementations accordingly to keep track of the current event state.</para> </question> <answer> <programlisting language="none">package sax.view; ... /** A view on memo documents restricting to sender name an subject. */ public class MemoViewHandler extends DefaultHandler { // These variables help us to keep track of the current event state spanning // each startElement(...) -- character(...) -- endElement(...) event sequence boolean inFromContext = false, inSubjectContext = false; public void startElement(String namespaceUri, String localName, String rawName, Attributes attrs) { switch(rawName) { case "from": inFromContext = true; System.out.print("Sender: "); break; case "subject": inSubjectContext = true; System.out.print("Subject: "); break; case "surname": if (inFromContext) { System.out.print(" "); // Adding additional space between <name> and <surname> content. } break; } } @Override public void endElement(String uri, String localName, String rawName) throws SAXException { switch(rawName) { case "from": inFromContext = false; System.out.println(); break; case "subject": inSubjectContext = false; System.out.println(); break; } } @Override public void characters(char[] ch, int start, int length) throws SAXException { if (inFromContext || inSubjectContext) { System.out.print(new String(ch, start, length)); } } }</programlisting> </answer> </qandaentry> </qandadiv> </qandaset> </section> </section> <section xml:id="saxValidate"> <title><acronym xlink:href="http://www.saxproject.org">SAX</acronym> validation</title> <para>So far we only parsed well formed document instances. Our current parser may operate on valid XML instances:</para> <figure xml:id="saxNotValid"> <title>An invalid XML document.</title> <programlisting language="none"><xs:element name="catalog"> <xs:complexType> <xs:sequence> <xs:element ref="item"/> </xs:sequence> </xs:complexType> </xs:element> <xs:element name="item"> <xs:complexType mixed="true"> <xs:attribute name="orderNo" type="xs:int" use="required"/> </xs:complexType> </xs:element></programlisting> <programlisting language="none"><catalog> <item orderNo="3218">Swinging headset</item> <item orderNo="9921">200W Stereo Amplifier</item> <emphasis role="bold"><!-- second entry forbidden by schema --></emphasis> </catalog></programlisting> <caption> <para>In contrast to <xref linkend="saxMissItem"/> this document is well formed. But it is not <emphasis role="bold">valid</emphasis> with respect to the schema since more than one <tag class="starttag">item</tag> elements are present.</para> </caption> </figure> <para>This document instance is well-formed but not valid: Only one element <tag class="starttag">item</tag> is allowed due to an ill-defined schema. The parser will not report any error or warning. In order to enable validation we need to configure our parser:</para> <programlisting language="none">xmlReader.setFeature("http://xml.org/sax/features/validation", true);</programlisting> <para>The string <code>http://xml.org/sax/features/validation</code> serves as a key. Since this is an ordinary string value a parser may or may not implement it. The <acronym xlink:href="http://www.saxproject.org">SAX</acronym> standard defines two exception classes for dealing with feature related errors:</para> <variablelist> <varlistentry> <term><link xlink:href="http://java.sun.com/j2se/1.5.0/docs/api/org/xml/sax/SAXNotRecognizedException.html">SAXNotRecognizedException</link></term> <listitem> <para>The feature is not known to the parser.</para> </listitem> </varlistentry> <varlistentry> <term><link xlink:href="http://java.sun.com/j2se/1.5.0/docs/api/org/xml/sax/SAXNotSupportedException.html">SAXNotSupportedException</link></term> <listitem> <para>The feature is known to the parser but the parser does not support it or it does not support a specific value being set as a value.</para> </listitem> </varlistentry> </variablelist> <para>The <productname xlink:href="http://projects.apache.org/projects/xml_commons_resolver.html">xml-commons resolver project </productname>offers an implementation being able to process various catalog file formats. Maven based project allow the corresponding library import by adding the following dependency:</para> <programlisting language="none"><dependency> <groupId>xml-resolver</groupId> <artifactId>xml-resolver</artifactId> <version>1.2</version> </dependency></programlisting> <para>We need a properties file <link xlink:href="http://xerces.apache.org/xml-commons/components/resolver/tips.html">CatalogManager.properties</link> defining XML catalogs to be used and additional parameters:</para> <programlisting language="none"># Catalogs are relative to this properties file relative-catalogs=false # Catalog list catalogs=\ /usr/share/eclipse/dropins/oxygenxml.oxygen_14.2/plugins/com.oxygenxml.editor_14.2.0.v2013021115/frameworks/xhtml/dtd/xhtmlcatalog.xml;\ /usr/share/eclipse/dropins/oxygenxml.oxygen_14.2/plugins/com.oxygenxml.editor_14.2.0.v2013021115/frameworks/xhtml11/dtd/xhtmlcatalog.xml # PUBLIC in favour of SYSTEM prefer=public</programlisting> <para>This configuration uses some catalogs from the <trademark>Oxygen</trademark> <trademark>Eclipse</trademark> plugin. We may now add a resolver to our SAX application by referencing the above configuration file <coref linkend="resolverPropertyFile"/> and registering the resolver to our SAX parser instance <coref linkend="resolverRegister"/>:</para> <programlisting language="none">xmlReader = saxParser.getXMLReader(); // Set up resolving PUBLIC identifier final CatalogManager cm = new CatalogManager("<emphasis role="bold">CatalogManager.properties</emphasis>" <co xml:id="resolverPropertyFile"/> ); final CatalogResolver resolver = new CatalogResolver(cm); xmlReader.setEntityResolver(resolver) <co xml:id="resolverRegister"/>;</programlisting> </section> <section xml:id="saxNamespace"> <title>Namespaces</title> <para>In order to make a <acronym xlink:href="http://www.saxproject.org">SAX</acronym> parser application namespace aware we have to activate two <acronym xlink:href="http://www.saxproject.org">SAX</acronym> parsing features:</para> <programlisting language="none">xmlReader = saxParser.getXMLReader(); xmlReader.setFeature("http://xml.org/sax/features/namespaces", true); xmlReader.setFeature("http://xml.org/sax/features/namespace-prefixes", true);</programlisting> <para>This instructs the parser to pass the namespace's name for each element. Namespace prefixes like <code>xsl</code> in <tag class="starttag">xsl:for-each</tag> are also passed and may be used by an application:</para> <programlisting language="none">package sax; ... public class NamespaceEventHandler extends DefaultHandler { ... public void startElement(String <emphasis role="bold">namespaceUri</emphasis>, String localName, String rawName, Attributes attrs) { System.out.println("Opening Element rawName='" + rawName + "'\n" + "namespaceUri='" + <emphasis role="bold">namespaceUri</emphasis> + "'\n" + "localName='" + localName + "'\n--------------------------------------------"); }</programlisting> <para>As an example we take a XSLT script:</para> <programlisting language="none"><?xml version="1.0" encoding="utf-8"?> <xsl:stylesheet version="1.0" xmlns:xsl='http://www.w3.org/1999/XSL/Transform' xmlns:fo='http://www.w3.org/1999/XSL/Format'> <xsl:template match="/"> <fo:block>A block</fo:block> <HTML/> </xsl:template> </xsl:stylesheet></programlisting> <para>This XSLT script being conceived as a XML document instance contains elements belonging to two different namespaces namely <code>http://www.w3.org/1999/XSL/Transform</code> and <code>http://www.w3.org/1999/XSL/Format</code>. The script also contains a <quote>raw</quote> <tag audience="" class="emptytag">HTML</tag> element being introduced only for demonstration purposes belonging to the default namespace. The result reads:</para> <programlisting language="none">Opening Element rawName='xsl:stylesheet' namespaceUri='http://www.w3.org/1999/XSL/Transform' localName='stylesheet' -------------------------------------------- Opening Element rawName='xsl:template' namespaceUri='http://www.w3.org/1999/XSL/Transform' localName='template' -------------------------------------------- Opening Element rawName='fo:block' namespaceUri='http://www.w3.org/1999/XSL/Format' localName='block' -------------------------------------------- Opening Element rawName='HTML' namespaceUri='' localName='HTML'</programlisting> <para>Now the parser tells us to which namespace a given element node belongs to. A XSLT engine for example uses this information to build two classes of elements:</para> <itemizedlist> <listitem> <para>Elements belonging to the namespace <code>http://www.w3.org/1999/XSL/Transform</code> like <tag class="emptytag">xsl:value-of select="..."</tag> have to be interpreted as instructions by the processor.</para> </listitem> <listitem> <para>Elements <emphasis role="bold">not</emphasis> belonging to the namespace <code>http://www.w3.org/1999/XSL/Transform</code> like <tag class="emptytag">html</tag> or <tag class="starttag">fo:block</tag> are copied <quote>as is</quote> to the output.</para> </listitem> </itemizedlist> <qandaset defaultlabel="qanda" xml:id="quandaentry_SqlFromXml"> <title>Generating SQL INSERT statements from XML data</title> <qandadiv> <qandaentry> <question> <para>Consider the following schema and document instance example:</para> <figure xml:id="catalogProductDescriptionsExample"> <title>A sample catalog containing products and corresponding descriptions.</title> <programlisting language="none"><xs:element name="catalog"> <xs:complexType> <xs:sequence> <xs:element ref="product" minOccurs="0" maxOccurs="unbounded"/> </xs:sequence> </xs:complexType> </xs:element> <xs:element name="product"> <xs:complexType> <xs:sequence> <xs:element name="name" type="xs:string"/> <xs:element name="description" type="xs:string" minOccurs="0" maxOccurs="unbounded"/> <xs:element name="age" type="xs:int" minOccurs="0" maxOccurs="1"/> </xs:sequence> <xs:attribute name="id" type="xs:ID" use="required"/> </xs:complexType> </xs:element></programlisting> <programlisting language="none"><catalog ... xsi:noNamespaceSchemaLocation="catalog.xsd"> <product id="mpt"> <name>Monkey Picked Tea</name> <description>Rare wild Chinese tea</description> <description>Picked only by specially trained monkeys</description> </product> <product id="instantTent"> <name>4-Person Instant Tent</name> <description>4-person, 1-room tent</description> <description>Pre-attached tent poles</description> <description>Exclusive WeatherTec system.</description> <age>15</age> </product> </catalog></programlisting> </figure> <para>Data being contained in catalog instances shall be transferred to a relational database system. Implement and test a <link linkend="gloss_SAX"><abbrev>SAX</abbrev></link> application by following the subsequently described steps:</para> <glosslist> <glossentry> <glossterm>Database schema</glossterm> <glossdef> <para>Create a database schema matching a product of your choice (<productname>Mysql</productname>, <productname>Oracle</productname>, ...). Your schema should map type and integrity constraints of the given DTD. In particular:</para> <itemizedlist> <listitem> <para>The element <tag class="starttag">age</tag> is optional.</para> </listitem> <listitem> <para><tag class="starttag">description</tag> elements are children of <product> elements and should thus be modeled by a 1:n relation.</para> </listitem> <listitem> <para>In a catalog the order of descriptions of a given product matters. Thus your schema should allow for descriptions being ordered.</para> </listitem> </itemizedlist> </glossdef> </glossentry> <glossentry> <glossterm>SAX Application</glossterm> <glossdef> <para>The order of appearance of the XML elements <tag class="starttag">product</tag>, <tag class="starttag">name</tag> and <tag class="starttag">age</tag> does not permit a linear generation of suitable SQL <code>INSERT</code> statements by a <link linkend="gloss_SAX"><abbrev>SAX</abbrev></link> content handler. Instead you will have to keep copies of local element values when implementing <methodname>org.xml.sax.ContentHandler.startElement(String,String,String,org.xml.sax.Attributes)</methodname> and related callback methods. The following sequence of insert statements corresponds to the XML data being contained in <xref linkend="catalogProductDescriptionsExample"/>. You may use these statements as a blueprint to be generated by your <link linkend="gloss_SAX"><abbrev>SAX</abbrev></link> application:</para> <programlisting language="none"><emphasis role="bold">INSERT INTO Product VALUES ('mpt', 'Monkey picked tea', NULL);</emphasis> INSERT INTO Description VALUES('mpt', 0, 'Picked only by specially trained monkeys'); INSERT INTO Description VALUES('mpt', 1, 'Rare wild Chinese tea'); <emphasis role="bold">INSERT INTO Product VALUES ('instantTent', '4-person instant tent', 15);</emphasis> INSERT INTO Description VALUES('instantTent', 0, 'Exclusive WeatherTec system.'); INSERT INTO Description VALUES('instantTent', 1, '4-person, 1-room tent'); INSERT INTO Description VALUES('instantTent', 2, 'Pre-attached tent poles');</programlisting> <para>Provide a suitable <xref linkend="glo_Junit"/> test.</para> </glossdef> </glossentry> </glosslist> </question> <answer> <annotation role="make"> <para role="eclipse">P/catalog2sql</para> </annotation> <para>Running this project and executing tests requires the following Maven project dependency to be installed (e.g. locally via <command>mvn</command> <option>install</option>) to satisfy a dependency:</para> <annotation role="make"> <para role="eclipse">P/saxerrorhandler</para> </annotation> <para>Some remarks are in order here:</para> <orderedlist> <listitem> <para>The <xref linkend="glo_SQL"/> database schema might read:</para> <programlisting language="sql">CREATE TABLE Product ( id CHAR(20) NOT NULL PRIMARY KEY <co linkends="catalog2sqlSchema-1" xml:id="catalog2sqlSchema-1-co"/> ,name VARCHAR(255) NOT NULL ,age SMALLINT <co linkends="catalog2sqlSchema-2" xml:id="catalog2sqlSchema-2-co"/> ); CREATE TABLE Description ( product CHAR(20) NOT NULL REFERENCES Product <co linkends="catalog2sqlSchema-3" xml:id="catalog2sqlSchema-3-co"/> ,orderIndex int NOT NULL <co linkends="catalog2sqlSchema-4" xml:id="catalog2sqlSchema-4-co"/> -- preserving the order of descriptions belonging to a given product ,text VARCHAR(255) NOT NULL ,UNIQUE(product, orderIndex) <co linkends="catalog2sqlSchema-5" xml:id="catalog2sqlSchema-5-co"/> );</programlisting> <calloutlist> <callout arearefs="catalog2sqlSchema-1-co" xml:id="catalog2sqlSchema-1"> <para>The primary key constraint implements the uniqueness of <tag class="starttag">product id='xyz'</tag> values</para> </callout> <callout arearefs="catalog2sqlSchema-2-co" xml:id="catalog2sqlSchema-2"> <para>Nullability of <code>age</code> implements <tag class="starttag">age</tag> elements being optional.</para> </callout> <callout arearefs="catalog2sqlSchema-3-co" xml:id="catalog2sqlSchema-3"> <para><tag class="starttag">description</tag> elements being children of <tag class="starttag">product</tag> are being implemented by a foreign key to its identifying owner thus forming weak entities.</para> </callout> <callout arearefs="catalog2sqlSchema-4-co" xml:id="catalog2sqlSchema-4"> <para>The attribute <code>orderIndex</code> allows descriptions to be sorted thus maintaining the original order of appearance of <tag class="starttag">description</tag> elements.</para> </callout> <callout arearefs="catalog2sqlSchema-5-co" xml:id="catalog2sqlSchema-5"> <para>The <code>orderIndex</code> attribute is unique within the set of descriptions belonging to the same product.</para> </callout> </calloutlist> </listitem> <listitem> <para>The result of the given input XML sample file should be similar to the content of the supplied reference file <filename>products.reference.xml</filename>:</para> <programlisting language="sql">INSERT INTO Product (id, name) VALUES ('mpt', 'Monkey Picked Tea'); INSERT INTO Description VALUES('mpt', 0, 'Rare wild Chinese tea'); INSERT INTO Description VALUES('mpt', 1, 'Picked only by specially trained monkeys'); -- end of current product entry -- INSERT INTO Product VALUES ('instantTent', '4-Person Instant Tent', 15); INSERT INTO Description VALUES('instantTent', 0, '4-person, 1-room tent'); INSERT INTO Description VALUES('instantTent', 1, 'Pre-attached tent poles'); INSERT INTO Description VALUES('instantTent', 2, 'Exclusive WeatherTec system.'); -- end of current product entry --</programlisting> <para>So a <xref linkend="glo_Junit"/> test may just execute the XML to SQL converter and then compare the effective output to the above reference file.</para> </listitem> </orderedlist> </answer> </qandaentry> </qandadiv> </qandaset> <qandaset defaultlabel="qanda" xml:id="quandaentry_NumElemByNs"> <title>Counting element names grouped by namespaces</title> <qandadiv> <qandaentry> <question> <para>We want to extend the SAX examples counting <link linkend="saxElementCount">elements</link> and <link linkend="exercise_saxAttrib">attributes</link> of arbitrary document instances. Consider the following <link linkend="gloss_XHTML">XHTML</link> + <link linkend="gloss_SVG">SVG</link> + <link linkend="gloss_MathML">MathML</link> sample document:</para> <programlisting language="none"><?xml version="1.0" encoding="UTF-8"?> <xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:xs="http://www.w3.org/2001/XMLSchema" <co xml:id="xhtmlCombinedNs_Svg"/> xmlns:h="http://www.w3.org/1999/xhtml" <co xml:id="xhtmlCombinedNs_Xhtml"/> exclude-result-prefixes="xs" version="2.0"> <xsl:template match="/"> <h:html> <h:head> <h:title></h:title> </h:head> <h:body> <h:h1>A heading</h:h1> <h:p>A paragraph</h:p> <h:h1>Yet another heading</h:h1> <xsl:apply-templates/> </h:body> </h:html> </xsl:template> <xsl:template match="*"> <xsl:message> <xsl:text>No template defined for element '</xsl:text> <xsl:value-of select="name(.)"/> <xsl:text>'</xsl:text> </xsl:message> </xsl:template> </xsl:stylesheet></programlisting> <para>This XSL stylesheet defines two different namespaces <coref linkend="xhtmlCombinedNs_Xhtml"/> and <coref linkend="xhtmlCombinedNs_Xhtml"/>.</para> <para>Implement a <link linkend="gloss_SAX">SAX</link> application being able to group elements from arbitrary XML documents by namespaces along with their corresponding frequencies of occurrence. The intended output for the previous <xref linkend="glo_XSL"/> example shall look like:</para> <programlisting language="none">Namespace '<emphasis role="bold">http://www.w3.org/1999/xhtml</emphasis>' contains: <head> (1 occurrence) <p> (1 occurrence) <h1> (2 occurrences) <html> (1 occurrence) <title> (1 occurrence) <body> (1 occurrence) Namespace '<emphasis role="bold">http://www.w3.org/1999/XSL/Transform</emphasis>' contains: <stylesheet> (1 occurrence) <template> (2 occurrences) <value-of> (1 occurrence) <apply-templates> (1 occurrence) <text> (2 occurrences) <message> (1 occurrence)</programlisting> <para>Hint: Counting frequencies and grouping by namespaces may be achieved by using standard Java container implementations of <classname>java.util.Map</classname>. You may for example define sets of related XML elements and group them by their corresponding namespaces. Thus nested maps are being required.</para> </question> <answer> <annotation role="make"> <para role="eclipse">P/catalog2sql</para> </annotation> <para>Running this project and executing tests requires the following Maven project dependency to be installed (e.g. locally via <command>mvn</command> <option>install</option>) to satisfy a dependency:</para> <annotation role="make"> <para role="eclipse">P/saxerrorhandler</para> </annotation> <para>The above solution contains both a running application and a (incomplete) <xref linkend="glo_Junit"/> test.</para> </answer> </qandaentry> </qandadiv> </qandaset> </section> </section> <section xml:id="dom"> <title>The Document Object Model (<acronym xlink:href="http://www.w3.org/DOM">DOM</acronym>)</title> <titleabbrev><acronym xlink:href="http://www.w3.org/DOM">DOM</acronym></titleabbrev> <section xml:id="domBase"> <title>Language independent specification</title> <titleabbrev>Language independence</titleabbrev> <para>XML documents allow for automated content processing. We already discussed the <acronym xlink:href="http://www.saxproject.org">SAX</acronym> API to access XML documents by <link linkend="gloss_Java"><trademark>Java</trademark></link> applications. There are however situations where <acronym xlink:href="http://www.saxproject.org">SAX</acronym> is not appropriate:</para> <itemizedlist> <listitem> <para>The <acronym xlink:href="http://www.saxproject.org">SAX</acronym> is event based. XML node elements are passed to handler methods. Sometimes we want to access neighbouring nodes from a context node in our handler methods for example a <tag class="starttag">title</tag> following a <tag class="starttag">chapter</tag> node. <acronym xlink:href="http://www.saxproject.org">SAX</acronym> does not offer any support for this. If we need references to neighbouring nodes we have to create them ourselves during a <acronym xlink:href="http://www.saxproject.org">SAX</acronym> parsing run. This is tedious and leads to code being hard to understand.</para> </listitem> <listitem> <para>Some applications may want to select node sets by <acronym xlink:href="http://www.w3.org/TR/xpath">XPath</acronym> expressions which is completely impossible in a <acronym xlink:href="http://www.saxproject.org">SAX</acronym> application.</para> </listitem> <listitem> <para>We may want to move subtrees within a document itself (for example exchanging two <tag class="starttag">chapter</tag> nodes) or even transferring them to a different document.</para> </listitem> </itemizedlist> <para>The greatest deficiency of the <acronym xlink:href="http://www.saxproject.org">SAX</acronym> is the fact that an XML instance is not represented as a tree like structure but as a succession of events. The <acronym xlink:href="http://www.w3.org/DOM">DOM</acronym> allows us to represent XML document instances as tree like structures and thus enables navigational operations between nodes.</para> <para>In order to achieve language <emphasis>and</emphasis> software vendor independence the <acronym xlink:href="http://www.w3.org/DOM">DOM</acronym> approach uses two stages:</para> <itemizedlist> <listitem> <para>The <acronym xlink:href="http://www.w3.org/DOM">DOM</acronym> is formulated in an Interface Definition Language (<abbrev xlink:href="http://en.wikipedia.org/wiki/Interface_description_language">IDL</abbrev>)</para> </listitem> <listitem> <para>In order to use the <acronym xlink:href="http://www.w3.org/DOM">DOM</acronym> API by a concrete programming language a so called <emphasis>language binding</emphasis> is required. In languages like <link linkend="gloss_Java"><trademark>Java</trademark></link> the language binding will still be a set of (<link linkend="gloss_Java"><trademark>Java</trademark></link>) interfaces. Thus for actually coding an application an implementation of these interfaces is needed</para> </listitem> </itemizedlist> <para>So what exactly may an <abbrev xlink:href="http://en.wikipedia.org/wiki/Interface_description_language">IDL</abbrev> be? The programming language <link linkend="gloss_Java"><trademark>Java</trademark></link> already allows pure interface definitions without any implementation. In C++ the same result can be achieved by so called <emphasis>pure virtual classes</emphasis>. An <abbrev xlink:href="http://en.wikipedia.org/wiki/Interface_description_language">IDL</abbrev> offers extended features to describe such interfaces. For <acronym xlink:href="http://www.w3.org/DOM">DOM</acronym> the <productname xlink:href="http://www.omg.org/gettingstarted/corbafaq.htm">CORBA 2.2</productname> <abbrev xlink:href="http://en.wikipedia.org/wiki/Interface_description_language">IDL</abbrev> had been chosen to describe an XML document programming interface. As a first example we take an excerpt from the <acronym xlink:href="http://www.w3.org/DOM">DOM</acronym>'s <link xlink:href="http://www.w3.org/TR/DOM-Level-3-Core/core.html#ID-1950641247">Node</link> interface definition:</para> <programlisting language="none">interface Node { // NodeType const unsigned short ELEMENT_NODE = 1; const unsigned short ATTRIBUTE_NODE = 2; const unsigned short TEXT_NODE = 3; ... readonly attribute DOMString nodeName; attribute DOMString nodeValue; // raises(DOMException) on setting // raises(DOMException) on retrieval readonly attribute unsigned short nodeType; readonly attribute Node parentNode; ... readonly attribute NodeList childNodes; readonly attribute Node firstChild; ... Node insertBefore(in Node newChild, in Node refChild) raises(DOMException); ...</programlisting> <para>If we want to implement the <abbrev xlink:href="http://en.wikipedia.org/wiki/Interface_description_language">IDL</abbrev> <classname>org.w3c.dom.Node</classname> specification in e.g. <link linkend="gloss_Java"><trademark>Java</trademark></link> a language binding has to be defined. This means writing <link linkend="gloss_Java"><trademark>Java</trademark></link> code which closely resembles the <abbrev xlink:href="http://en.wikipedia.org/wiki/Interface_description_language">IDL</abbrev> specification. Obviously this task depends on and is restricted by the constructs being offered by the target programming language. The W3C <link xlink:href="http://www.w3.org/TR/DOM-Level-3-Core/java-binding.html">defines</link> the <link linkend="gloss_Java"><trademark>Java</trademark></link> <classname>org.w3c.dom.Node</classname> interface by:</para> <programlisting language="none">package org.w3c.dom; public interface Node { public static final short ELEMENT_NODE = 1; // Node Types public static final short ATTRIBUTE_NODE = 2; public static final short TEXT_NODE = 3; ... public String getNodeName(); public String getNodeValue() throws DOMException; public void setNodeValue(String nodeValue) throws DOMException; public short getNodeType(); public Node getParentNode(); public NodeList getChildNodes(); public Node getFirstChild(); ... public Node insertBefore(Node newChild, Node refChild) throws DOMException; ... }</programlisting> <para>We take <methodname>org.w3c.dom.Node.getChildNodes()</methodname> as an example:</para> <figure xml:id="domRetrieveChildren"> <title>Retrieving child nodes of a given context node</title> <mediaobject> <imageobject> <imagedata fileref="Ref/Fig/domtree.fig" scale="65"/> </imageobject> </mediaobject> </figure> <para>The <classname>org.w3c.dom.Node</classname> interface offers a set of common operations for objects being part of a XML document. But a XML document tree contains different types of nodes such as:</para> <itemizedlist> <listitem> <para>Elements</para> </listitem> <listitem> <para>Attributes</para> </listitem> <listitem> <para>Entities</para> </listitem> </itemizedlist> <para>An XML API may address this issue by offering data types to represent these different kinds of nodes. The <acronym xlink:href="http://www.w3.org/DOM">DOM</acronym> <link linkend="gloss_Java"><trademark>Java</trademark></link> Binding defines an inheritance hierarchy of interfaces for this purpose:</para> <figure xml:id="domJavaNodeInterfaces"> <title>Inheritance interface hierarchy in the <acronym xlink:href="http://www.w3.org/DOM">DOM</acronym> <link linkend="gloss_Java"><trademark>Java</trademark></link> binding</title> <mediaobject> <imageobject> <imagedata fileref="Ref/Fig/nodeHierarchy.svg"/> </imageobject> </mediaobject> </figure> <para>Two commonly used <link linkend="gloss_Java"><trademark>Java</trademark></link> implementations of these interfaces are:</para> <variablelist> <varlistentry> <term>Xerces</term> <listitem> <para><orgname xlink:href="http://xml.apache.org/xerces2-j">Apache Software foundation</orgname></para> </listitem> </varlistentry> <varlistentry> <term>Jaxp</term> <listitem> <para><orgname xlink:href="http://java.sun.com/xml/jaxp">Sun microsystems</orgname></para> </listitem> </varlistentry> </variablelist> <para>Both implementations offer additional interfaces beyond the <acronym xlink:href="http://www.w3.org/DOM">DOM</acronym>'s scope.</para> <para>Going back to the <acronym xlink:href="http://www.w3.org/DOM">DOM</acronym> itself the specification is divided into <link xlink:href="http://www.w3.org/TR/2004/REC-DOM-Level-3-Core-20040407/introduction.html#DOMArchitecture-h2">modules</link>:</para> <figure xml:id="figureDomModules"> <title><acronym xlink:href="http://www.w3.org/DOM">DOM</acronym> modules.</title> <mediaobject> <imageobject> <imagedata fileref="Ref/Screen/dom-architecture.screen.png"/> </imageobject> </mediaobject> </figure> </section> <section xml:id="domCreate"> <title>Creating a new document from scratch</title> <titleabbrev>New document</titleabbrev> <para>If we want to export non-XML content (e.g. from a RDBMS) into XML we may achieve this by the following recipe:</para> <orderedlist> <listitem> <para>Create a document builder instance.</para> </listitem> <listitem> <para>Create an empty <link xlink:href="http://java.sun.com/j2se/1.5.0/docs/api/org/w3c/dom/Document.html">Document</link> instance.</para> </listitem> <listitem> <para>Fill in the desired Elements and Attributes.</para> </listitem> <listitem> <para>Create a serializer.</para> </listitem> <listitem> <para>Serialize the resulting tree to a stream.</para> </listitem> </orderedlist> <para>An introductory piece of code illustrates these steps:</para> <figure xml:id="simpleDomCreate"> <title>Creation of a XML document instance from scratch.</title> <programlisting language="none">package dom; ... public class CreateDoc { public static void main(String[] args) throws Exception { // Create the root element <emphasis role="bold">final Element titel = new Element("titel"); </emphasis> //Set a date <emphasis role="bold">titel.setAttribute("date", "23.02.2000");</emphasis> // Append a text node as child <emphasis role="bold">titel.addContent(new Text("Versuch 1"));</emphasis> // Set formatting for the XML output <emphasis role="bold">final Format outFormat = Format.getPrettyFormat();</emphasis> // Serialize to console <emphasis role="bold">final XMLOutputter printer = new XMLOutputter(outFormat); printer.output(titel, System.out);</emphasis> } }</programlisting> </figure> <para>We get the following result:</para> <programlisting language="none"><?xml version="1.0" encoding="UTF-8"?> <titel date="23.02.2000">Versuch 1</titel></programlisting> </section> <section xml:id="domCreateExercises"> <title>Exercises</title> <qandaset defaultlabel="qanda" xml:id="createDocModify"> <title>A sub structured <tag class="starttag">title</tag></title> <qandadiv> <qandaentry> <question> <label>Creation of an extended XML document instance</label> <para>In order to run the examples given during the lecture the <filename xlink:href="http://www.jdom.org/downloads">jdom2.jar</filename> library must be added to the <envar>CLASSPATH</envar>.</para> <para>The <acronym xlink:href="http://www.w3.org/DOM">DOM</acronym> creating example given before may be used as a starting point. Extend the <acronym xlink:href="http://www.w3.org/DOM">DOM</acronym> tree created in <xref linkend="simpleDomCreate"/> to produce an extended XML document:</para> <programlisting language="none"><title> <long>The long version of this title</long> <short>Short version</short> </title></programlisting> </question> <answer> <programlisting language="none">package dom; ... public class CreateExtended { /** * @param args * @throws IOException */ public static void main(String[] args) throws IOException { final Element titel = new Element("titel"), tLong = new Element("long"), tShort = new Element("short"); <emphasis role="bold">// Append <long> and <short> to parent <title></emphasis> titel.addContent(tLong).addContent(tShort); <emphasis role="bold">// Append text to <long> and <short></emphasis> tLong.addContent(new Text("The long version of this title")); tShort.addContent(new Text("Short version")); <emphasis role="bold">// Set formatting for the XML output</emphasis> Format outFormat = Format.getPrettyFormat(); <emphasis role="bold">// Serialize to console</emphasis> final XMLOutputter printer = new XMLOutputter(outFormat); printer.output(titel, System.out); } }</programlisting> </answer> </qandaentry> </qandadiv> </qandaset> </section> <section xml:id="domParse"> <title>Parsing existing XML documents</title> <titleabbrev>Parsing</titleabbrev> <para>We already used a <acronym xlink:href="http://www.saxproject.org">SAX</acronym> to parse an XML document. Rather than handling <acronym xlink:href="http://www.saxproject.org">SAX</acronym> events ourselves these events may be used to construct a <acronym xlink:href="http://www.w3.org/DOM">DOM</acronym> representation of our document. This work is done by an instance of. We use our catalog example from <xref linkend="simpleCatalog"/> as an introductory example.</para> <para>We already noticed the need for an <classname>org.xml.sax.ErrorHandler</classname> object during <acronym xlink:href="http://www.saxproject.org">SAX</acronym> processing. A <acronym xlink:href="http://www.w3.org/DOM">DOM</acronym> Parser requires a similar type of Object in order to react to parsing errors in a meaningful way. In principle a <acronym xlink:href="http://www.w3.org/DOM">DOM</acronym> parser implementor is free to choose his implementation but most implementations are based on top of a <acronym xlink:href="http://www.saxproject.org">SAX</acronym> parser. For this reason it was natural to choose a <acronym xlink:href="http://www.w3.org/DOM">DOM</acronym> error handling interface which is similar to a <acronym xlink:href="http://www.saxproject.org">SAX</acronym> <classname>org.xml.sax.ErrorHandler</classname>. The following code serves the needs described before:</para> <figure xml:id="domTreeTraversal"> <title>Accessing a XML Tree purely by <acronym xlink:href="http://www.w3.org/DOM">DOM</acronym> methods.</title> <programlisting language="none">package dom; ... public class ArticleOrder { <emphasis role="bold"> // Though we are playing DOM here, a <acronym xlink:href="http://www.saxproject.org">SAX</acronym> parser still // assembles our DOM tree.</emphasis> private SAXBuilder builder = new SAXBuilder(); public ArticleOrder() { <emphasis role="bold">// Though an ErrorHandler is not strictly required it allows // for easierlocalization of XML document errors</emphasis> builder.setErrorHandler(new MySaxErrorHandler(System.out));<co linkends="domSetSaxErrorHandler-co" xml:id="domSetSaxErrorHandler"/> } /** Descending a catalog till its <item> elements. For each product * its name and order number are being written to the output. * @throws ... */ public void process(final String filename) throws JDOMException, IOException { <emphasis role="bold">// Parsing our XML file</emphasis> final Document docInput = builder.build(filename); <emphasis role="bold">// Accessing the document's root element</emphasis> final Element docRoot = docInput.getRootElement(); <emphasis role="bold">// Accessing the <item> children of parent element <catalog></emphasis> final List<Element> items = docRoot.getChildren(); // Element nodes only for (final Element item : items) { System.out.println("Article: " + item.getText() + ", order number: " + item.getAttributeValue("orderNo")); } ...</programlisting> <para>Note <coref linkend="domSetSaxErrorHandler" xml:id="domSetSaxErrorHandler-co"/>: This is our standard <acronym xlink:href="http://www.saxproject.org">SAX</acronym> error handler implementing the <classname>org.xml.sax.ErrorHandler</classname> interface.</para> </figure> <para>Executing this method needs a driver instance providing an input XML filename:</para> <programlisting language="none">package dom; ... public class ArticleOrderDriver { public static void main(String[] argv) throws Exception { final ArticleOrder ao = new ArticleOrder(); ao.process("<emphasis role="bold">Input/article.xml</emphasis>"); } }</programlisting> <para>This yields:</para> <programlisting language="none">Article: Swinging headset, order number: 3218 Article: 200W Stereo Amplifier, order number: 9921</programlisting> <para>To illustrate the internal processes we take a look at the sequence diagram:</para> <mediaobject> <imageobject> <imagedata fileref="Ref/Fig/sequenceDomParser.svg"/> </imageobject> </mediaobject> <qandaset defaultlabel="qanda" xml:id="exercise_domHtmlSimple"> <title>Creating HTML output</title> <qandadiv> <qandaentry> <question> <label>Simple HTML output</label> <para>Instead exporting simple text output in <xref linkend="domTreeTraversal"/> we may also create HTML pages like:</para> <programlisting language="none"><!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"> <html> <head> <title>Available articles</title> </head> <body> <h1>Available articles</h1> <table> <tbody> <tr> <th align="left">Article Description</th><th>Order Number</th> </tr> <tr> <td align="left"><emphasis role="bold">Swinging headset</emphasis></td><td><emphasis role="bold">3218</emphasis></td> </tr> <tr> <td align="left"><emphasis role="bold">200W Stereo Amplifier</emphasis></td><td><emphasis role="bold">9921</emphasis></td> </tr> </tbody> </table> </body> </html></programlisting> <para>Instead of simply writing <code>...println(<html>\n\t<head>...)</code> statements you are expected to code a more sophisticated solution. We may combine<xref linkend="createDocModify"/> and <xref linkend="createDocModify"/>. The idea is reading the XML catalog instance as a <acronym xlink:href="http://www.w3.org/DOM">DOM</acronym> as before. Then construct a <emphasis>second</emphasis> <acronym xlink:href="http://www.w3.org/DOM">DOM</acronym> tree for the desired HTML output and fill in the article information from the first <acronym xlink:href="http://www.w3.org/DOM">DOM</acronym> tree accordingly.</para> </question> <answer> <para>We introduce a class <classname>solve.dom.HtmlTree</classname>:</para> <programlisting language="none">package solve.dom; ... package solve.dom; import java.io.IOException; import java.io.PrintStream; import org.jdom2.DocType; import org.jdom2.Document; import org.jdom2.Element; import org.jdom2.Text; import org.jdom2.output.Format; import org.jdom2.output.XMLOutputter; /** * Holding a HTML DOM to produce output. * @author goik */ public class HtmlTree { private Document htmlOutput; private Element tableBody; public HtmlTree(final String titleText, final String[] tableHeaderFields) { <co linkends="programlisting_catalog2html_htmlskel_co" xml:id="programlisting_catalog2html_htmlskel"/> DocType doctype = new DocType("html", "-//W3C//DTD XHTML 1.0 Strict//EN", "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"); final Element htmlRoot = new Element("html"); <co linkends="programlisting_catalog2html_tablehead_co" xml:id="programlisting_catalog2html_tablehead"/> htmlOutput = new Document(htmlRoot); htmlOutput.setDocType(doctype); // We create a HTML skeleton including an "empty" table final Element head = new Element("head"), body = new Element("body"), table = new Element("table"); htmlRoot.addContent(head).addContent(body); head.addContent(new Element("title").addContent(new Text(titleText))); body.addContent(new Element("h1").addContent(new Text(titleText))); body.addContent(table); tableBody = new Element("tbody"); table.addContent(tableBody); final Element tr = tableBody.addContent(new Element("tr")); for (final String headerField: tableHeaderFields) { tr.addContent(new Element("th").addContent(new Text(headerField))); } } public void appendItem(final String itemName, final String orderNo) {<co linkends="programlisting_catalog2html_insertproduct_co" xml:id="programlisting_catalog2html_insertproduct"/> final Element tr = new Element("tr"); tableBody.addContent(tr); tr.addContent(new Element("td").addContent(new Text(itemName))); tr.addContent(new Element("td").addContent(new Text(orderNo))); } public void serialize(PrintStream out){ // Set formatting for the XML output final Format outFormat = Format.getPrettyFormat(); // Serialize to console final XMLOutputter printer = new XMLOutputter(outFormat); try { printer.output(htmlOutput, System.out); } catch (IOException e) { e.printStackTrace(); System.exit(1); } } /** * @return the table's <tbody> element */ public Element getTable() { return tableBody; } } </programlisting> <calloutlist> <callout arearefs="programlisting_catalog2html_htmlskel" xml:id="programlisting_catalog2html_htmlskel_co"> <para>A basic HTML skeleton is is being created:</para> <programlisting language="none"><?xml version="1.0" encoding="UTF-8"?> <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"> <html xmlns="http://www.w3.org/1999/xhtml"> <head> <title>Available articles</title> </head> <body> <h1>Available articles</h1> <table> <emphasis role="bold"><tbody></emphasis> <!-- Data to be inserted here in next step --> <emphasis role="bold"></tbody></emphasis> </table> </body> </html></programlisting> <para>The table containing the product's data is empty at this point and thus invalid.</para> </callout> <callout arearefs="programlisting_catalog2html_tablehead" xml:id="programlisting_catalog2html_tablehead_co"> <para>The table's header is appended but the actual data from our two products is still missing:</para> <programlisting language="none">... <h1>Available articles</h1> <table> <tbody> <tr> <th>Article Description</th> <th>Order Number</th> <emphasis role="bold"></tr></emphasis><!-- Data to be appended after this row in next step --> <emphasis role="bold"></tbody></emphasis> </table> ...</programlisting> </callout> <callout arearefs="programlisting_catalog2html_insertproduct" xml:id="programlisting_catalog2html_insertproduct_co"> <para>Calling <methodname>solve.dom.HtmlTree.appendItem(String,String)</methodname> once per product completes the creation of our HTML DOM tree:</para> <programlisting language="none">... </tr> <tr> <td>Swinging headset</td> <td>3218</td> </tr> <tr> <td>200W Stereo Amplifier</td> <td>9921</td> </tr> </tbody> ...</programlisting> </callout> </calloutlist> <para>The class <classname>solve.dom.Article2Html</classname> reads the catalog data:</para> <programlisting language="none">package solve.dom; ... public class Article2Html { private final SAXBuilder builder = new SAXBuilder(); private final HtmlTree htmlResult; public Article2Html() { builder.setErrorHandler(new MySaxErrorHandler(System.out)); htmlResult = new HtmlTree("Available articles", new String[] { <co linkends="programlisting_catalog2html_glue_createhtmldom_co" xml:id="programlisting_catalog2html_glue_createhtmldom"/> "Article Description", "Order Number" }); } /** Read an Xml catalog instance and insert product names among with their * order numbers into the HTML DOM. Then serialize HTML tree to a stream. * * @param * filename of the Xml source. * @param out * The output stream for HTML serialization. * @throws IOException * @throws JDOMException */ public void process(final String filename, final PrintStream out) throws JDOMException, IOException{ final List<Element> items = builder.build(filename).getRootElement().getChildren(); for (final Element item : items) { <co linkends="programlisting_catalog2html_glue_prodloop_co" xml:id="programlisting_catalog2html_glue_prodloop"/> htmlResult.appendItem(item.getText(), item.getAttributeValue("orderNo")); <co linkends="programlisting_catalog2html_glue_insertprod_co" xml:id="programlisting_catalog2html_glue_insertprod"/> } htmlResult.serialize(out); <co linkends="programlisting_catalog2html_glue_serialize_co" xml:id="programlisting_catalog2html_glue_serialize"/> } }</programlisting> <calloutlist> <callout arearefs="programlisting_catalog2html_glue_createhtmldom" xml:id="programlisting_catalog2html_glue_createhtmldom_co"> <para>Create an instance holding a HTML <acronym xlink:href="http://www.w3.org/DOM">DOM</acronym> with a table header containing the strings <emphasis>Article Description</emphasis> and <emphasis>Order Number</emphasis>.</para> </callout> <callout arearefs="programlisting_catalog2html_glue_prodloop" xml:id="programlisting_catalog2html_glue_prodloop_co"> <para>Iterate over all product nodes.</para> </callout> <callout arearefs="programlisting_catalog2html_glue_insertprod" xml:id="programlisting_catalog2html_glue_insertprod_co"> <para>Insert the product's name an order number into the HTML <acronym xlink:href="http://www.w3.org/DOM">DOM</acronym>.</para> </callout> <callout arearefs="programlisting_catalog2html_glue_serialize" xml:id="programlisting_catalog2html_glue_serialize_co"> <para>Serialize the completed HTML <acronym xlink:href="http://www.w3.org/DOM">DOM</acronym> tree to the output stream.</para> </callout> </calloutlist> </answer> </qandaentry> </qandadiv> </qandaset> </section> <section xml:id="domJavaScript"> <title>Using <acronym xlink:href="http://www.w3.org/DOM">DOM</acronym> with HTML/Javascript</title> <para>Due to script language support in a variety of browsers we may also use the <acronym xlink:href="http://www.w3.org/DOM">DOM</acronym> to implement client side event handling. As an example we <link xlink:href="Ref/src/tablesort.html">demonstrate</link> how a HTML table can be made sortable by clicking on a header's column. The example code along with the code description can be found at <uri xlink:href="http://www.kryogenix.org/code/browser/sorttable">http://www.kryogenix.org/code/browser/sorttable</uri>.</para> <para>Quite remarkably there are only few ingredients required to enrich an ordinary static HTML table with this functionality:</para> <itemizedlist> <listitem> <para>An external Javascript library has to be included via <code><script type="text/javascript" src="sorttable.js"></code></para> </listitem> <listitem> <para>Each sortable HTML table needs:</para> <itemizedlist> <listitem> <para>A unique <code>id</code> attribute</para> </listitem> <listitem> <para>A <code>class="sortable"</code> attribute</para> </listitem> </itemizedlist> </listitem> </itemizedlist> </section> <section xml:id="domXpath"> <title>Using <acronym xlink:href="http://www.w3.org/TR/xpath">XPath</acronym></title> <para><xref linkend="domTreeTraversal"/> demonstrated the possibility to traverse trees solely by using <acronym xlink:href="http://www.w3.org/DOM">DOM</acronym> Method calls. Though this approach is possible it will in general not lead to stable applications. Real world examples are often based on large XML documents with complex hierarchical structures. Thus using this rather primitive approach will foster deeply nested method calls being necessary to access desired node sets. In addition changing the conceptional schema will require rewriting large code portions..</para> <para>As we already know from <abbrev xlink:href="http://www.w3.org/Style/XSL">XSL</abbrev> transformations <code>Xpath</code> allows to address node sets inside a XML tree. The role of <acronym xlink:href="http://www.w3.org/TR/xpath">XPath</acronym> can be compared to SQL queries when working with relational databases. <acronym xlink:href="http://www.w3.org/TR/xpath">XPath</acronym> may also be used within <link linkend="gloss_Java"><trademark>Java</trademark></link> code. As a first example we show an image filename extracting application operating on XHTML documents. The following example contains three <tag class="starttag">img</tag> elements:</para> <figure xml:id="htmlGallery"> <title>A HTML document containing <code>IMG</code> tags.</title> <programlisting language="none"><?xml version="1.0"?> <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"> <html> <head> <title>Picture gallery</title> </head> <body> <h1>Picture gallery</h1> <p>Images may appear inline:<emphasis role="bold"><img src="inline.gif" alt="none"/></emphasis></p> <table> <tbody> <tr> <td>Number one:</td> <td><emphasis role="bold"><img src="one.gif" alt="none"/></emphasis></td> </tr> <tr> <td>Number two:</td> <td><emphasis role="bold"><img src="http://www.hdm-stuttgart.de/favicon.ico" alt="none"/></emphasis></td> </tr> </tbody> </table> </body> </html> </programlisting> </figure> <para>A given HTML document may contain <tag class="starttag">img</tag> elements at <emphasis>arbitrary</emphasis> positions. It is sometimes desirable to check for existence and accessibility of such external objects being necessary for the page's correct rendering. A simple XSL script will do first part the job namely extracting the <tag class="starttag">img</tag> elements:</para> <figure xml:id="gallery2imagelist"> <title>A <abbrev xlink:href="http://www.w3.org/Style/XSL">XSL</abbrev> script for image name extraction.</title> <programlisting language="none"><xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:html="http://www.w3.org/1999/xhtml"> <xsl:output method="text"/> <xsl:template match="/"> <xsl:for-each select="//html:img"> <xsl:value-of select="@src"/> <xsl:text> </xsl:text> </xsl:for-each> </xsl:template> </xsl:stylesheet></programlisting> </figure> <para>Note the necessity for <code>html</code> namespace inclusion into the <acronym xlink:href="http://www.w3.org/TR/xpath">XPath</acronym> expression in <code><xsl:for-each select="//html:img"></code>. A simple <code>select="//img"></code> results in an empty node set. Executing the <abbrev xlink:href="http://www.w3.org/Style/XSL">XSL</abbrev> script yields a list of image filenames being contained in the HTML page i.e. <code>inline.gif one.gif two.gif</code>.</para> <para>Now we want to write a <link linkend="gloss_Java"><trademark>Java</trademark></link> application which allows to check whether these referenced image files do exist and have sufficient permissions to be accessed. A simple approach may pipe the <abbrev xlink:href="http://www.w3.org/Style/XSL">XSL</abbrev> output to our application which then executes the readability checks. Instead we want to incorporate the <acronym xlink:href="http://www.w3.org/TR/xpath">XPath</acronym> based search into the application. Ignoring Namespaces and trying to resemble the <abbrev xlink:href="http://www.w3.org/Style/XSL">XSL</abbrev> actions as closely as possible our application will have to search for <link xlink:href="http://java.sun.com/j2se/1.5.0/docs/api/org/w3c/dom/Element.html">Element</link> Nodes by the <acronym xlink:href="http://www.w3.org/TR/xpath">XPath</acronym> expression <code>//html:img</code>:</para> <figure xml:id="domFindImages"> <title>Extracting <tag class="emptytag">img</tag> element image references from a HTML document.</title> <programlisting language="none">package dom.xpath; ... public class DomXpath { private final SAXBuilder builder = new SAXBuilder(); public DomXpath() { builder.setErrorHandler(new MySaxErrorHandler(System.err)); } public void process(final String xhtmlFilename) throws JDOMException, IOException { final Document htmlInput = builder.build(xhtmlFilename);<co linkends="programlisting_java_searchimg_parse_co" xml:id="programlisting_java_searchimg_parse"/> final XPathExpression<Object> xpath = XPathFactory.instance().compile( "//img" ); <co linkends="programlisting_java_searchimg_pf_co" xml:id="programlisting_java_searchimg_pf"/> <co linkends="programlisting_java_searchimg_newxpath_co" xml:id="programlisting_java_searchimg_newxpath"/> final List<Object> images = xpath.evaluate(htmlInput);<co linkends="programlisting_java_searchimg_execquery_co" xml:id="programlisting_java_searchimg_execquery"/> for (Object o: images) { <co linkends="programlisting_java_searchimg_loop_co" xml:id="programlisting_java_searchimg_loop"/> final Element image = (Element ) o;<co linkends="programlisting_java_searchimg_cast_co" xml:id="programlisting_java_searchimg_cast"/> System.out.print(image.getAttribute("src") + " "); } } }</programlisting> <caption> <para>This application searches for <tag class="emptytag">img</tag> elements and shows their <code>src</code> attribute value.</para> </caption> </figure> <calloutlist> <callout arearefs="programlisting_java_searchimg_parse" xml:id="programlisting_java_searchimg_parse_co"> <para>Parse a XHTML document instance into a DOM tree.</para> </callout> <callout arearefs="programlisting_java_searchimg_pf" xml:id="programlisting_java_searchimg_pf_co"> <para>Create a <acronym xlink:href="http://www.w3.org/TR/xpath">XPath</acronym> factory.</para> </callout> <callout arearefs="programlisting_java_searchimg_newxpath" xml:id="programlisting_java_searchimg_newxpath_co"> <para>Create a <acronym xlink:href="http://www.w3.org/TR/xpath">XPath</acronym> query instance. This may be used to search for a set of nodes starting from a context node.</para> </callout> <callout arearefs="programlisting_java_searchimg_execquery" xml:id="programlisting_java_searchimg_execquery_co"> <para>Using the document's root node as the context node we search for <tag class="starttag">img</tag> elements appearing at arbitrary positions in our document.</para> </callout> <callout arearefs="programlisting_java_searchimg_loop" xml:id="programlisting_java_searchimg_loop_co"> <para>We iterate over the retrieved list of images.</para> </callout> <callout arearefs="programlisting_java_searchimg_cast" xml:id="programlisting_java_searchimg_cast_co"> <para>Casting to the correct type.</para> </callout> </calloutlist> <para>The result is a list of image filename references:</para> <programlisting language="none">inline.gif one.gif http://www.hdm-stuttgart.de/favicon.ico </programlisting> <qandaset defaultlabel="qanda" xml:id="quandaentry_CastAlwaysLegal"> <title>Legal casting?</title> <qandadiv> <qandaentry> <question> <para>Why is the cast in <coref linkend="programlisting_java_searchimg_cast"/> in <xref linkend="domFindImages"/> guaranteed to never cause a <classname>java.lang.ClassCastException</classname>?</para> </question> <answer> <para>The <acronym xlink:href="http://www.w3.org/TR/xpath">XPath</acronym> <code>//img</code> expression is guaranteed to return only <tag class="starttag">img</tag> elements. Thus within our <link linkend="gloss_Java"><trademark>Java</trademark></link> context we are sure to find only <classname>org.jdom2.Element</classname> instances.</para> </answer> </qandaentry> </qandadiv> </qandaset> <qandaset defaultlabel="qanda" xml:id="exercise_htmlImageVerify"> <title>Verification of referenced images readability</title> <qandadiv> <qandaentry> <question> <para>We want to extend the example given in <xref linkend="domFindImages"/> by testing the existence and checking for readability of referenced images. The following HTML document contains <quote>dead</quote> image references:</para> <programlisting language="none" xml:id="domCheckImageAccessibility"><!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"> <html xmlns="http://www.w3.org/1999/xhtml"> ... <body> <h1>External Pictures</h1> <p>A local image reference:<img src="inline.gif" alt="none"/></p> <table> <tbody> <tr> <td>An existing picture:</td> <td><img src="http://www.hdm-stuttgart.de/bilder_navigation/laptop.gif" alt="none"/></td> </tr> <tr> <td>A non-existing picture:</td> <td><img src="<emphasis role="bold">http://www.hdm-stuttgart.de/rotfl.gif</emphasis>" alt="none"/></td> </tr> </tbody> </table> </body> </html></programlisting> <para>Write an application which checks for readability of <abbrev xlink:href="http://www.ietf.org/rfc/rfc1738.txt">URL</abbrev> image references to <emphasis>external</emphasis> Servers starting either with <code>http://</code> or <code>ftp://</code> ignoring other protocol types. Internal image references referring to the <quote>current</quote> server typically look like <code><img src="/images/test.gif"</code>. So in order to distinguish these two types of references we may use the XSL built in function <link xlink:href="http://www.cafeconleche.org/books/bible2/chapters/ch17.html">starts-with()</link> testing for the <code>http</code> or <code>ftp</code> protocol definition part of an <abbrev xlink:href="http://www.ietf.org/rfc/rfc1738.txt">URL</abbrev>. A possible output for the example being given is:</para> <programlisting language="none">Received 'sun.awt.image.URLImageSource' from http://www.hdm-stuttgart.de/bilder_navigation/laptop.gif Unable to open 'http://www.hdm-stuttgart.de/rotfl.gif'</programlisting> <para>The following code snippet shows a helpful class method to check for both correctness of <abbrev xlink:href="http://www.ietf.org/rfc/rfc1738.txt">URL</abbrev>'s and accessibility of referenced objects:</para> <programlisting language="none">package dom.xpath; ... public class CheckUrl { public static void checkReadability(final String urlRef) { try { final URL url = new URL(urlRef); try { final Object imgCandidate = url.getContent(); if (null == imgCandidate) { System.err.println("Unable to open '" + urlRef + "'"); } else { System.out.println("Received '" + imgCandidate.getClass().getName() + "' from " + urlRef); } } catch (IOException e) { System.err.println("Unable to open '" + urlRef + "'"); } } catch (MalformedURLException e) { System.err.println("Adress '" + urlRef + "' is malformed"); } } }</programlisting> </question> <answer> <para>We are interested in the set of images within a given HTML document containing an <link xlink:href="http://www.w3.org/Addressing">URL</link> reference starting either with <code>http://</code> or <code>ftp://</code>. This is achieved by the following <acronym xlink:href="http://www.w3.org/TR/xpath">XPath</acronym> expression:</para> <programlisting language="none">//html:img[starts-with(@src, 'http://') or starts-with(@src, 'ftp://')]</programlisting> <para>The application only needs to pass the corresponding <abbrev xlink:href="http://www.ietf.org/rfc/rfc1738.txt">URL</abbrev>'s to the method <link xlink:href="domCheckUrlObjectExistence">CheckUrl.checkReadability()</link>. The rest of the code is identical to the <link linkend="domFindImages">introductory example</link>:</para> <informalfigure xml:id="solutionFintExtImgRef"> <programlisting language="none">package dom.xpath; ... public class CheckExtImage { private final SAXBuilder builder = new SAXBuilder(); public CheckExtImage() { builder.setErrorHandler(new MySaxErrorHandler(System.err)); } public void process(final String xhtmlFilename) throws JDOMException, IOException { final Document htmlInput = builder.build(xhtmlFilename); final XPathExpression<Object> xpath = XPathFactory.instance().compile( "<emphasis role="bold">//img[starts-with(@src, 'http://') or starts-with(@src, 'ftp://')]</emphasis>"); final List<Object> images = xpath.evaluate(htmlInput); for (Object o: images) { final Element image = (Element ) o; <emphasis role="bold">CheckUrl.checkReadability(image.getAttributeValue("src"));</emphasis> } } }</programlisting> </informalfigure> </answer> </qandaentry> </qandadiv> </qandaset> </section> <section xml:id="domXsl"> <title><acronym xlink:href="http://www.w3.org/DOM">DOM</acronym> and <abbrev xlink:href="http://www.w3.org/Style/XSL">XSL</abbrev></title> <para><link linkend="gloss_Java"><trademark>Java</trademark></link> based <link linkend="gloss_XML"><abbrev>XML</abbrev></link> applications may use XSL style sheets for processing. A <acronym xlink:href="http://www.w3.org/DOM">DOM</acronym> tree may for example be transformed into another tree. The package <link xlink:href="http://java.sun.com/j2se/1.5.0/docs/api/javax/xml/transform/package-frame.html">javax.xml.transform</link> provides interfaces and classes for this purpose. We consider the following product catalog example:</para> <figure xml:id="climbingCatalog"> <title>A simplified <link linkend="gloss_XML"><abbrev>XML</abbrev></link> product catalog</title> <programlisting language="none"><catalog xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:noNamespaceSchemaLocation="catalog.xsd"> <title>Outdoor products</title> <introduction> <para>We offer a great variety of basic stuff for mountaineering such as ropes, harnesses and tents.</para> <para>Our shop is proud for its large number of available sleeping bags.</para> </introduction> <product id="x-223"> <title>Multi freezing bag Nightmare camper</title> <description> <para>You will feel comfortable till minus 20 degrees - At least if you are a penguin or a polar bear.</para> </description> </product> <product id="r-334"> <title>Rope 40m</title> <description> <para>Excellent for indoor climbing.</para> </description> </product> </catalog></programlisting> <para>A corresponding schema file <filename>catalog.xsd</filename> is straightforward:</para> <programlisting language="none"><xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns:vc="http://www.w3.org/2007/XMLSchema-versioning" elementFormDefault="qualified" vc:minVersion="1.0" vc:maxVersion="1.1"> <xs:simpleType name="money"> <xs:restriction base="xs:decimal"> <xs:fractionDigits value="2"/> </xs:restriction> </xs:simpleType> <xs:element name="title" type="xs:string"/> <xs:element name="para" type="xs:string"/> <xs:element name="description" type="paraSequence"/> <xs:element name="introduction" type="paraSequence"/> <xs:complexType name="paraSequence"> <xs:sequence> <xs:element ref="para" minOccurs="1" maxOccurs="unbounded"/> </xs:sequence> </xs:complexType> <xs:element name="product"> <xs:complexType> <xs:sequence> <xs:element ref="title"/> <xs:element ref="description"/> </xs:sequence> <xs:attribute name="id" type="xs:ID" use="required"/> <xs:attribute name="price" type="money" use="optional"/> </xs:complexType> </xs:element> <xs:element name="catalog"> <xs:complexType> <xs:sequence> <xs:element ref="title"/> <xs:element ref="introduction"/> <xs:element ref="product" minOccurs="1" maxOccurs="unbounded"/> </xs:sequence> </xs:complexType> </xs:element> </xs:schema> </programlisting> </figure> <para>A <abbrev xlink:href="http://www.w3.org/Style/XSL">XSL</abbrev> style sheet may be used to transform this document into the HTML Format:</para> <figure xml:id="catalog2html"> <title>A <abbrev xlink:href="http://www.w3.org/Style/XSL">XSL</abbrev> style sheet for catalog transformation to HTML.</title> <programlisting language="none"><?xml version="1.0" encoding="utf-8"?> <xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="2.0" xmlns="http://www.w3.org/1999/xhtml"> <xsl:template match="/catalog"> <html> <head><title><xsl:value-of select="title"/></title></head> <body style="background-color:#FFFFFF"> <h1><xsl:value-of select="title"/></h1> <xsl:apply-templates select="product"/> </body> </html> </xsl:template> <xsl:template match="product"> <h3><xsl:value-of select="title"/></h3> <xsl:for-each select="description/para"> <p><xsl:value-of select="."/></p> </xsl:for-each> <xsl:if test="price"> <p> <xsl:text>Price:</xsl:text> <xsl:value-of select="price/@value"/> </p> </xsl:if> </xsl:template> </xsl:stylesheet></programlisting> </figure> <para>As a preparation for <xref linkend="exercise_catalogRdbms"/> we now demonstrate the usage of <abbrev xlink:href="http://www.w3.org/Style/XSL">XSL</abbrev> within a <link linkend="gloss_Java"><trademark>Java</trademark></link> application. This is done by a <link xlink:href="http://java.sun.com/j2se/1.5.0/docs/api/javax/xml/transform/Transformer.html">Transformer</link> instance:</para> <figure xml:id="xml2xml"> <title>Transforming an XML document instance to HTML by a XSL style sheet.</title> <programlisting language="none">package dom.xsl; ... public class Xml2Html { private final SAXBuilder builder = new SAXBuilder(); final XSLTransformer transformer; public Xml2Html(final String xslFilename) throws XSLTransformException { builder.setErrorHandler(new MySaxErrorHandler(System.err)); transformer = new XSLTransformer(xslFilename); } public void transform(final String xmlInFilename, final String resultFilename) throws JDOMException, IOException { final Document inDoc = builder.build(xmlInFilename); Document result = transformer.transform(inDoc); // Set formatting for the XML output final Format outFormat = Format.getPrettyFormat(); // Serialize to console final XMLOutputter printer = new XMLOutputter(outFormat); printer.output(result.getDocument(), System.out); } }</programlisting> </figure> <para>A corresponding driver file is needed to invoke a transformation:</para> <figure xml:id="xml2xmlDriver"> <title>A driver class for the xml2xml transformer.</title> <programlisting language="none">package dom.xsl; ... public class Xml2HtmlDriver { ... public static void main(String[] args) { final String inFilename = "Input/Dom/climbing.xml", xslFilename = "Input/Dom/catalog2html.xsl", htmlOutputFilename = "Input/Dom/climbing.html"; try { final Xml2Html converter = new Xml2Html(xslFilename); converter.transform(inFilename, htmlOutputFilename); } catch (Exception e) { System.err.println("The conversion of '" + inFilename + "' by stylesheet '" + xslFilename + "' to output HTML file '" + htmlOutputFilename + "' failed with the following error:" + e); e.printStackTrace(); } } }</programlisting> </figure> <qandaset defaultlabel="qanda" xml:id="exercise_catalogRdbms"> <title>HTML from XML and relational data</title> <qandadiv> <qandaentry> <question> <label>Catalogs and RDBMS</label> <para>We want to extend the transformation being described before in <xref linkend="xml2xml"/> by reading price information from a RDBMS. Consider the following schema and <code>INSERT</code>s:</para> <programlisting language="none">CREATE TABLE Product( orderNo CHAR(10) ,price NUMERIC(10,2) ); INSERT INTO Product VALUES('x-223', 330.20); INSERT INTO Product VALUES('w-124', 110.40);</programlisting> <para>Adding prices may be implemented the following way:</para> <mediaobject> <imageobject> <imagedata fileref="Ref/Fig/xml2html.fig"/> </imageobject> </mediaobject> <para>You may implement this by following these steps:</para> <orderedlist> <listitem> <para>You may reuse class <classname>sax.rdbms.RdbmsAccess</classname> from <xref linkend="saxRdbms"/>.</para> </listitem> <listitem> <para>Use the previous class to modify <xref linkend="xml2xml"/> by introducing a new method <code>addPrices(final Document catalog)</code> which adds prices to the <acronym xlink:href="http://www.w3.org/DOM">DOM</acronym> tree accordingly. The insertion points may be reached by an <acronym xlink:href="http://www.w3.org/TR/xpath">XPath</acronym> expression.</para> </listitem> </orderedlist> </question> <answer> <para>The additional functionality on top of <xref linkend="xml2xml"/> is represented by a method <methodname>dom.xsl.XmlRdbms2Html.addPrices()</methodname>. This method modifies the <acronym xlink:href="http://www.w3.org/DOM">DOM</acronym> input tree prior to applying the XSL. Prices are being inserting based on data received from an RDBMS via <trademark xlink:href="http://electronics.zibb.com/trademark/jdbc/29545026">JDBC</trademark>:</para> <programlisting language="none">package dom.xsl; ... public class XmlRdbms2Html { private final SAXBuilder builder = new SAXBuilder(); DbAccess db = new DbAccess(); final XSLTransformer transformer; Document catalog; final org.jdom2.xpath.XPathExpression<Object> selectProducts = XPathFactory.instance().compile("/catalog/product"); /** * @param xslFilename the stylesheet being used for subsequent * transformations by {@link #transform(String, String)}. * * @throws XSLTransformException */ public XmlRdbms2Html(final String xslFilename) throws XSLTransformException { builder.setErrorHandler(new MySaxErrorHandler(System.err)); transformer = new XSLTransformer(xslFilename); } /** * The actual workhorse carrying out the transformation * and adding prices from the database table. * * @param xmlInFilename input file to be transformed * @param resultFilename the result file holding the generated HTML document * @throws JDOMException The transformation may fail for various reasons. * @throws IOException */ public void transform(final String xmlInFilename, final String resultFilename) throws JDOMException, IOException { catalog = builder.build(xmlInFilename); addPrices(); final Document htmlResult = transformer.transform(catalog); // Set formatting for the XML output final Format outFormat = Format.getPrettyFormat(); // Serialize to console final XMLOutputter printer = new XMLOutputter(outFormat); printer.output(htmlResult, System.out); } private void addPrices() { final List<Object> products = selectProducts.evaluate(catalog.getRootElement()); db.connect("jdbc:mysql://localhost:3306/hdm", "hdmuser", "XYZ"); for (Object p: products) { final Element product = (Element ) p; final String productId = product.getAttributeValue("id"); product.setAttribute("price", db.readPrice(productId)); } db.close(); } }</programlisting> <para>The method <code>addPrices(...)</code> utilizes our RDBMS access class:</para> <programlisting language="none">package dom.xsl; ... public class DbAccess { public void connect(final String jdbcUrl, final String userName, final String password) { try { conn = DriverManager.getConnection(jdbcUrl, userName, password); priceQuery = conn.prepareStatement(sqlPriceQuery); } catch (SQLException e) { System.err.println("Unable to open connection to database:" + e);} } public String readPrice(final String articleNumber) { String result; try { priceQuery.setString(1, articleNumber); final ResultSet rs = priceQuery.executeQuery(); if (rs.next()) { result = rs.getString("price"); } else { result = "No price available for article '" + articleNumber + "'"; } } catch (SQLException e) { result = "Error reading price for article '" + articleNumber + "':" + e; } return result; } ... }</programlisting> <para>Of course the connection details should be moved to a configuration file.</para> </answer> </qandaentry> </qandadiv> </qandaset> </section> </section> </chapter> <chapter xml:id="introPersistence"> <title>Accessing Relational Data</title> <section xml:id="persistence"> <title>Persistence in Object Oriented languages</title> <para>Following <xref linkend="bib_Bauer05"/> we may define persistence by:</para> <blockquote> <para>persistence allows an object to outlive the process that created it. The state of the object may be stored to disk and an object with the same state re-created at some point in the future.</para> </blockquote> <para>The notion of <quote>process</quote> refers to operating systems. Let us start wit a simple example assuming a <link linkend="gloss_Java"><trademark>Java</trademark></link> class User:</para> <programlisting language="none">public class User { String cname; //The user's common name e.g. 'Joe Bix' String uid; //The user's unique system ID (login name) e.g. 'bix' // getters, setters and other stuff ... }</programlisting> <para>A relational implementation might look like:</para> <programlisting language="none">CREATE TABLE User( CHAR(80) cname ,CHAR(10) uid PRIMARY KEY )</programlisting> <para>Now a <link linkend="gloss_Java"><trademark>Java</trademark></link> application may create instances of class <code>User</code> and save these to a database:</para> <figure xml:id="processObjPersist"> <title>Persistence across process boundaries</title> <mediaobject> <imageobject> <imagedata fileref="Ref/Fig/persistence.fig" scale="65"/> </imageobject> </mediaobject> </figure> <para>Both the <trademark xlink:href="http://www.oracle.com/technetwork/java/javase">JRE</trademark> instances and the RDBMS database server are processes (or sets of processes) typically existing in different address spaces. The two <trademark xlink:href="http://www.oracle.com/technetwork/java/javase">JRE</trademark> processes mentioned here may as well be started in disjoint address spaces. In fact we might even run two entirely different applications implemented in different programming languages like <abbrev xlink:href="http://www.php.net">PHP</abbrev>.</para> <para>It is important to mention that the two arrows <quote>save</quote> and <quote>load</quote> thus typically denote a communication across machine boundaries.</para> </section> <section xml:id="jdbcIntro"> <title>Introduction to <trademark xlink:href="http://electronics.zibb.com/trademark/jdbc/29545026">JDBC</trademark></title> <section xml:id="jdbcWrite"> <title>Write access, principles</title> <para>Connecting an application to a database means to establish a connection from a client to a database server:</para> <figure xml:id="jdbcClientServer"> <title>Networking between clients and database servers</title> <mediaobject> <imageobject> <imagedata fileref="Ref/Fig/clientserv.fig"/> </imageobject> </mediaobject> </figure> <para>So <trademark xlink:href="http://www.oracle.com/technetwork/java/javase/jdbc">JDBC</trademark> is just one among a whole bunch of protocol implementations connecting database servers and applications. Consequently <trademark xlink:href="http://www.oracle.com/technetwork/java/javase/jdbc">JDBC</trademark> is expected to appear in the lower layer of multi-tier applications. We take a three-tier application as a starting point:</para> <figure xml:id="jdbcThreeTier"> <title>The role of <trademark xlink:href="http://www.oracle.com/technetwork/java/javase/jdbc">JDBC</trademark> in a three-tier application</title> <mediaobject> <imageobject> <imagedata fileref="Ref/Fig/jdbcThreeTier.fig"/> </imageobject> </mediaobject> </figure> <para>We may add an additional layer. Web applications are typically being build on top of an application server (<productname xlink:href="http://www.ibm.com/software/de/websphere/">WebSphere</productname>, <productname xlink:href="http://glassfish.java.net">Glassfish</productname>, <productname xlink:href="http://www.jboss.org/jbossas">Jboss</productname>,...) providing additional services:</para> <figure xml:id="jdbcFourTier"> <title><trademark xlink:href="http://www.oracle.com/technetwork/java/javase/jdbc">JDBC</trademark> connecting application server and database.</title> <mediaobject> <imageobject> <imagedata fileref="Ref/Fig/jdbcFourTier.fig"/> </imageobject> </mediaobject> </figure> <para>So what is actually required to connect to a database server? A client requires the following parameter values to open a connection:</para> <orderedlist> <listitem xml:id="ItemJdbcProtocol"> <para>The type of database server i.e. <productname xlink:href="http://www.oracle.com/us/products/database">Oracle</productname>, <productname xlink:href="www.ibm.com/software/data/db2">DB2</productname>, <productname xlink:href="http://www-01.ibm.com/software/data/informix">Informix</productname>, <productname xlink:href="http://www.mysql.com">Mysql</productname> etc. This information is needed because of vendor dependent <trademark xlink:href="http://electronics.zibb.com/trademark/jdbc/29545026">JDBC</trademark> protocol implementations.</para> </listitem> <listitem> <para>The server's <link xlink:href="http://en.wikipedia.org/wiki/Domain_Name_System">DNS</link> name or IP number</para> </listitem> <listitem> <para>The database service's port number at the previously defined host. The database server process listens for connections to this port number.</para> </listitem> <listitem xml:id="itemJdbcDatabaseName"> <para>The database name within the given database server</para> </listitem> <listitem> <para>Optional: A database user's account name and password.</para> </listitem> </orderedlist> <para>Items <xref linkend="ItemJdbcProtocol"/> - <xref linkend="itemJdbcDatabaseName"/> will be encapsulated into a so called <trademark xlink:href="http://electronics.zibb.com/trademark/jdbc/29545026">JDBC</trademark> <link xlink:href="http://en.wikipedia.org/wiki/Uniform_Resource_Locator">URL</link>. We consider a typical example corresponding to the previous parameter list:</para> <figure xml:id="jdbcUrlComponents"> <title>Components of a <trademark xlink:href="http://electronics.zibb.com/trademark/jdbc/29545026">JDBC</trademark> URL</title> <mediaobject> <imageobject> <imagedata fileref="Ref/Fig/jdbcurl.fig" scale="65"/> </imageobject> </mediaobject> </figure> <para>In fact this <trademark xlink:href="http://electronics.zibb.com/trademark/jdbc/29545026">JDBC</trademark> URL example closely resembles other types of URL strings as being defined in <uri xlink:href="http://www.ietf.org/rfc/rfc2396.txt">http://www.ietf.org/rfc/rfc2396.txt</uri>. Look for <code>opaque_part</code> to understand the second <quote>:</quote> in the protocol definition part of a <trademark xlink:href="http://electronics.zibb.com/trademark/jdbc/29545026">JDBC</trademark> URL. Common example for <abbrev xlink:href="http://www.ietf.org/rfc/rfc1738.txt">URL</abbrev>s are:</para> <itemizedlist> <listitem> <para><code>http://www.hdm-stuttgart.de/aaa</code></para> </listitem> <listitem> <para><code>http://someserver.com:8080/someResource</code></para> </listitem> <listitem> <para><code>ftp://mirror.mi.hdm-stuttgart.de/Firmen</code></para> </listitem> </itemizedlist> <para>We notice the explicit mentioning of a port number 8080 in the second example; The default <abbrev xlink:href="http://www.w3.org/Protocols">http</abbrev> protocol port number is 80. So if a web server accepts connections at port 80 we do not have to specify this value. A web browser will automatically use this default port.</para> <para>Actually the notion <quote><code>jdbc:mysql</code></quote> denotes a sub protocol implementation namely<orgname> Mysql</orgname>'s implementation of <trademark xlink:href="http://electronics.zibb.com/trademark/jdbc/29545026">JDBC</trademark>. Connecting to an IBM DB2 server would require jdbc:db2 for this protocol part.</para> <para>In contrast to <abbrev xlink:href="http://www.w3.org/Protocols">http</abbrev> no standard ports are <quote>officially</quote> assigned for <trademark xlink:href="http://electronics.zibb.com/trademark/jdbc/29545026">JDBC</trademark> protocol variants. Due to vendor specific implementations this does not make any sense. Thus we <emphasis role="bold">always</emphasis> have to specify the port number when opening <trademark xlink:href="http://electronics.zibb.com/trademark/jdbc/29545026">JDBC</trademark> connections.</para> <para>Writing <trademark xlink:href="http://electronics.zibb.com/trademark/jdbc/29545026">JDBC</trademark> based applications follows a simple scheme:</para> <figure xml:id="jdbcArchitecture"> <title>Architecture of JDBC</title> <mediaobject> <imageobject> <imagedata fileref="Ref/Fig/jdbcarch.fig"/> </imageobject> </mediaobject> </figure> <para>From a programmer's point of view the <classname>java.sql.DriverManager</classname> is a bootstrapping object: Other objects like Statement instances are created from this central and unique object.</para> <para>The first instance being created by the <classname>java.sql.DriverManager</classname> is an object of type <classname>java.sql.Connection</classname>. In <xref linkend="exerciseJdbcWhyInterface"/> we discuss the way vendor specific implementation details are hidden by Interfaces. We can distinguish between:</para> <orderedlist> <listitem> <para>Vendor neutral specific parts of a <trademark xlink:href="http://www.oracle.com/technetwork/java/javase/jdbc">JDBC</trademark> environment. These are those components being shipped by Oracle or other organizations providing <link linkend="gloss_Java"><trademark>Java</trademark></link> runtimes. The class <classname>java.sql.DriverManager</classname> belongs to this domain.</para> </listitem> <listitem> <para>Vendor specific parts. In <xref linkend="jdbcArchitecture"/> this starts with the <classname>java.sql.Connection</classname> object.</para> </listitem> </orderedlist> <para>The <classname>java.sql.Connection</classname> object thus marks the boundary between a <trademark xlink:href="http://www.oracle.com/technetwork/java/javase">JDK</trademark> / <trademark xlink:href="http://www.oracle.com/technetwork/java/javase">JRE</trademark> and a <trademark xlink:href="http://www.oracle.com/technetwork/java/javase/jdbc">JDBC</trademark> Driver implementation from e.g. Oracle or other institutions.</para> <para><xref linkend="jdbcArchitecture"/> does not show details about the relations between <classname>java.sql.Connection</classname>, <classname>java.sql.Statement</classname> and <classname>java.sql.ResultSet</classname> objects. We start by giving a rough description of the tasks and responsibilities these three types have:</para> <glosslist> <glossentry> <glossterm><classname>java.sql.Connection</classname></glossterm> <glossdef> <para>Holding a permanent connection to a database server. Both client and server can contact each other. The database server may for example terminate a transaction if problems like deadlocks occur.</para> </glossdef> </glossentry> <glossentry> <glossterm><classname>java.sql.Statement</classname></glossterm> <glossdef> <para>We have two distinct classes of actions:</para> <orderedlist> <listitem> <para>Instructions to modify data on the database server. These include <code>INSERT</code>, <code>UPDATE</code> and <code>DELETE</code> operations as far as <abbrev>SQL-DML</abbrev> is concerned. <trademark xlink:href="http://www.oracle.com/technetwork/java/javase/jdbc">JDBC</trademark> acts as a means of transport and merely returns integer values back to the client like the number of rows being affected by an UPDATE.</para> </listitem> <listitem> <para>Instructions reading data from the server. This is done by sending SELECT statements. It is not sufficient to just return integer values: Instead <trademark xlink:href="http://www.oracle.com/technetwork/java/javase/jdbc">JDBC</trademark> needs to copy complete datasets back to the client to fill containers being accessible by applications. This is being discussed in <xref linkend="jdbcRead"/>.</para> </listitem> </orderedlist> </glossdef> </glossentry> </glosslist> <para>We shed some light on the relationship between these important <trademark xlink:href="http://www.oracle.com/technetwork/java/javase/jdbc">JDBC</trademark> components and their respective creation:<figure xml:id="jdbcObjectCreation"> <title>Important <trademark xlink:href="http://www.oracle.com/technetwork/java/javase/jdbc">JDBC</trademark> instances and relationships.</title> <mediaobject> <imageobject> <imagedata fileref="Ref/Fig/jdbcObjectRelation.fig"/> </imageobject> </mediaobject> </figure></para> </section> <section xml:id="writeAccessCoding"> <title>Write access, coding!</title> <para>So how does it actually work with respect to coding? You may want to read <xref linkend="toolingConfigJdbc"/> before starting your exercises. We first prepare a database table using Eclipse's database tools:</para> <figure xml:id="figSchemaPerson"> <title>A relation <code>Person</code> containing names and email addresses</title> <programlisting language="none"><emphasis role="strong">CREATE</emphasis> <emphasis role="strong">TABLE</emphasis> Person ( name CHAR(20) ,email CHAR(20) <emphasis>UNIQUE</emphasis>)</programlisting> </figure> <para>Our actual (toy) <trademark xlink:href="http://electronics.zibb.com/trademark/jdbc/29545026">JDBC</trademark> application will insert a single object ('Jim', 'jim@foo.org') into the <code>Person</code> relation. This is simpler than reading data since no client <classname>java.sql.ResultSet</classname> container is needed:</para> <figure xml:id="figJdbcSimpleWrite"> <title>A simple <trademark xlink:href="http://electronics.zibb.com/trademark/jdbc/29545026">JDBC</trademark> application inserting data into a relational table.</title> <programlisting language="none">01 package sda.jdbc.intro.v1; 02 03 import java.sql.Connection; 04 import java.sql.DriverManager; 05 import java.sql.SQLException; 06 import java.sql.Statement; 07 08 public class SimpleInsert { 09 10 public static void main(String[] args) throws SQLException { 11 // Step 1: Open a connection to the database server 12 final Connection conn = DriverManager.getConnection( 13 "jdbc:mysql://localhost:3306/hdm", "hdmuser", "XYZ"); 14 // Step 2: Create a Statement instance 15 final Statement stmt = conn.createStatement(); 16 // Step 3: Execute the desired INSERT 17 final int updateCount = stmt.executeUpdate( 18 "INSERT INTO Person VALUES('Jim', 'jim@foo.org')"); 19 // Step 4: Give feedback to the enduser 20 System.out.println("Successfully inserted " + updateCount + " dataset(s)"); 21 } 22 }</programlisting> </figure> <para>Looks simple? Unfortunately it does not (yet) work:</para> <programlisting language="none">Exception in thread "main" java.sql.SQLException: <emphasis role="bold"> No suitable driver found for jdbc:mysql://localhost:3306/hdm</emphasis> at java.sql.DriverManager.getConnection(DriverManager.java:604) at java.sql.DriverManager.getConnection(DriverManager.java:221) at sda.jdbc.intro.SimpleInsert.main(SimpleInsert.java:12)</programlisting> <para>What's wrong here? In <xref linkend="figureConfigJdbcDriver"/> we needed a <productname xlink:href="http://www.mysql.com">Mysql</productname> <trademark xlink:href="http://electronics.zibb.com/trademark/jdbc/29545026">JDBC</trademark> Driver implementation <filename>mysql-connector-java.jar</filename> as a prerequisite to open connections to a database server. This implementation is mandatory for our toy application as well. All we have to do is adding <filename>mysql-connector-java.jar</filename> to our <link linkend="gloss_Java"><trademark>Java</trademark></link> <varname>CLASSPATH</varname> at <emphasis role="bold">runtime</emphasis>.</para> <para>Depending on our <link linkend="gloss_Java"><trademark>Java</trademark></link> environment this will be achieved by different means. Eclipse requires the definition of a run configuration as being described in <uri xlink:href="http://help.eclipse.org/juno/index.jsp?topic=/org.eclipse.jdt.doc.user/tasks/tasks-java-local-configuration.htm">http://help.eclipse.org/juno/index.jsp?topic=/org.eclipse.jdt.doc.user/tasks/tasks-java-local-configuration.htm</uri>. When configuring a run-time configuration for <classname>sda.jdbc.intro.SimpleInsert</classname> we have to add <filename>mysql-connector-java.jar</filename> to the <varname>Classpath</varname> tab. The following screen shot shows a working configuration:</para> <figure xml:id="figureConfigRunExtJar"> <title>Creating an Eclipse run time configuration containing a <productname xlink:href="http://www.mysql.com">Mysql</productname> <trademark xlink:href="http://electronics.zibb.com/trademark/jdbc/29545026">JDBC</trademark> Driver Jar marked red.</title> <screenshot> <mediaobject> <imageobject> <imagedata fileref="Ref/Screen/runConfigJarAnnot.screen.png" scale="70"/> </imageobject> </mediaobject> </screenshot> </figure> <para>This time execution works as expected:</para> <programlisting language="none">Successfully inserted 1 dataset(s)</programlisting> <qandaset defaultlabel="qanda" xml:id="quandaentry_DupInsert"> <title>Exception on inserting objects</title> <qandadiv> <qandaentry> <question> <para>A second invocation of <classname>sda.jdbc.intro.v1.SimpleInsert</classname> yields the following runtime error:</para> <programlisting language="none">Exception in thread "main" com.mysql.jdbc.exceptions.jdbc4.MySQLIntegrityConstraintViolationException: <emphasis role="bold">Duplicate entry 'jim@foo.org' for key 'email'</emphasis> ... at com.mysql.jdbc.StatementImpl.executeUpdate(StatementImpl.java:1617) at sda.jdbc.intro.SimpleInsert.main(SimpleInsert.java:17)</programlisting> </question> <answer> <para>This expected error is easy to understand: The exception's message text <emphasis role="bold">Duplicate entry 'Jim' for key 'PRIMARY'</emphasis> informs us about a UNIQUE key constraint violation with respect to the attribute <code>email</code> in our schema definition in <xref linkend="figSchemaPerson"/>. We cannot add a second entry with the same value <code>'jim@foo.org'</code>.</para> </answer> </qandaentry> </qandadiv> </qandaset> <para>It is worth to mention that the <productname xlink:href="http://www.mysql.com">Mysql</productname> driver implementation does not have to be available at compile time. <trademark xlink:href="http://electronics.zibb.com/trademark/jdbc/29545026">JDBC</trademark> defines interfaces in favour of (concrete) classes. The latter are only required at runtime.</para> <para>When working with eclipse we need a separate runtime configuration for each runnable <link linkend="gloss_Java"><trademark>Java</trademark></link> application to add the <trademark xlink:href="http://electronics.zibb.com/trademark/jdbc/29545026">JDBC</trademark> driver implementation to the runtime <envar>CLASSPATH</envar>. This may become tedious. Judging the pros and cons you may simply add <filename>mysql-connector-java.jar</filename> to your compile time <envar>CLASSPATH as well</envar>. As a drawback all <trademark xlink:href="http://electronics.zibb.com/trademark/jdbc/29545026">JDBC</trademark> implementing classes will now become visible wen e.g. hitting auto-completion.</para> <para>We now discuss some important methods being defined in the <trademark xlink:href="http://electronics.zibb.com/trademark/jdbc/29545026">JDBC</trademark> interfaces:</para> <glosslist> <glossentry> <glossterm><classname>java.sql.Connection</classname></glossterm> <glossdef> <itemizedlist> <listitem> <para><link xlink:href="http://docs.oracle.com/javase/7/docs/api/java/sql/Connection.html#createStatement()">createStatement()</link></para> </listitem> <listitem> <para><link xlink:href="http://docs.oracle.com/javase/7/docs/api/java/sql/Connection.html#setAutoCommit(boolean)">setAutoCommit()</link>, <link xlink:href="http://docs.oracle.com/javase/7/docs/api/java/sql/Connection.html#getAutoCommit()">getAutoCommit()</link></para> </listitem> <listitem> <para><link xlink:href="http://docs.oracle.com/javase/7/docs/api/java/sql/Connection.html#getWarnings()">getWarnings()</link></para> </listitem> <listitem> <para><link xlink:href="http://docs.oracle.com/javase/7/docs/api/java/sql/Connection.html#isClosed()">isClosed()</link>, <link xlink:href="http://docs.oracle.com/javase/7/docs/api/java/sql/Connection.html#isValid(int)">isValid(int timeout)</link></para> </listitem> <listitem> <para><link xlink:href="http://docs.oracle.com/javase/7/docs/api/java/sql/Connection.html#rollback()">rollback()</link>, <link xlink:href="http://docs.oracle.com/javase/7/docs/api/java/sql/Connection.html#commit()">commit()</link> and .</para> </listitem> <listitem> <para><link xlink:href="http://docs.oracle.com/javase/7/docs/api/java/sql/Connection.html#close()">close()</link></para> </listitem> </itemizedlist> </glossdef> </glossentry> <glossentry> <glossterm><classname>java.sql.Statement</classname></glossterm> <glossdef> <itemizedlist> <listitem> <para><link xlink:href="http://docs.oracle.com/javase/7/docs/api/java/sql/Statement.html#executeUpdate(java.lang.String)">executeUpdate(String sql)</link></para> </listitem> <listitem> <para><link xlink:href="http://docs.oracle.com/javase/7/docs/api/java/sql/Statement.html#getConnection()">getConnection()</link></para> </listitem> <listitem> <para><link xlink:href="http://docs.oracle.com/javase/7/docs/api/java/sql/Statement.html#getResultSet()">getResultSet()</link></para> </listitem> <listitem> <para><link xlink:href="http://docs.oracle.com/javase/7/docs/api/java/sql/Statement.html#close()">close()</link> and <link xlink:href="http://docs.oracle.com/javase/7/docs/api/java/sql/Statement.html#isClosed()">isClosed()</link></para> </listitem> </itemizedlist> </glossdef> </glossentry> </glosslist> <qandaset defaultlabel="qanda" xml:id="quandaentry_AutoCommit"> <title><trademark xlink:href="http://electronics.zibb.com/trademark/jdbc/29545026">JDBC</trademark> and transactions</title> <qandadiv> <qandaentry> <question> <para><link xlink:href="http://docs.oracle.com/javase/7/docs/api/java/sql/Connection.html#setAutoCommit(boolean)">How does the method setAutoCommit()</link> relate to <link xlink:href="http://docs.oracle.com/javase/7/docs/api/java/sql/Connection.html#commit()">commit()</link> and <link xlink:href="http://docs.oracle.com/javase/7/docs/api/java/sql/Connection.html#rollback()">rollback()</link>?</para> </question> <answer> <para>A connections default state is <code>autocommit == true</code>. This means that individual SQL statements are executed as separate transactions.</para> <para>If we want to group two or more statements into a transaction we have to:</para> <orderedlist> <listitem> <para>Call <code>connection.setAutoComit(false)</code></para> </listitem> <listitem> <para>From now on subsequent SQL statements will implicitly become part of a transaction till either of the three events happens:</para> <orderedlist numeration="loweralpha"> <listitem> <para><code>connection.commit()</code></para> </listitem> <listitem> <para><code>connection.rollback()</code></para> </listitem> <listitem> <para>The transaction gets aborted by the database server. This may for example happen in case of a deadlock conflict with a second transaction.</para> </listitem> </orderedlist> <para>Note that the first two events are initiated by our client software. The third possible action is being carried out by the database server.</para> </listitem> </orderedlist> </answer> </qandaentry> </qandadiv> </qandaset> <qandaset defaultlabel="qanda" xml:id="quandaentry_Close"> <title>Closing <trademark xlink:href="http://electronics.zibb.com/trademark/jdbc/29545026">JDBC</trademark> connections</title> <qandadiv> <qandaentry> <question> <para>Why is it very important to call the close() method for <classname>java.sql.Connection</classname> and / or <classname>java.sql.Statement</classname> instances?</para> </question> <answer> <para>A <trademark xlink:href="http://electronics.zibb.com/trademark/jdbc/29545026">JDBC</trademark> connection ties network resources (socket connections). These may be used up if e.g. new connections get established within a loop without being closed.</para> <para>The situation is comparable to memory leaks when using programming languages lacking a garbage collector.</para> </answer> </qandaentry> </qandadiv> </qandaset> <qandaset defaultlabel="qanda" xml:id="quandaentry_AbortTran"> <title>Aborted transactions</title> <qandadiv> <qandaentry> <question> <para>In the previous exercise we mentioned the possibility of a transaction abort issued by the database server. Which responsibility arises for an application programmer? Hint: How may an implementation become aware of such an abort transaction event?</para> </question> <answer> <para>If a database server aborts a transaction a <classname>java.sql.SQLException</classname> will be thrown. An application must be aware of this possibility and thus implement a sensible <code>catch(...)</code> clause accordingly.</para> </answer> </qandaentry> </qandadiv> </qandaset> <qandaset defaultlabel="qanda" xml:id="exerciseJdbcWhyInterface"> <title>Interfaces and classes in <trademark xlink:href="http://electronics.zibb.com/trademark/jdbc/29545026">JDBC</trademark></title> <qandadiv> <qandaentry> <question> <para>The <trademark xlink:href="http://electronics.zibb.com/trademark/jdbc/29545026">JDBC</trademark> standard mostly defines interfaces as <classname>java.sql.Connection</classname> and <classname>java.sql.Statement</classname>. Why are these not being defined as classes? Moreover why is <classname>java.sql.DriverManager</classname> being defined as a class rather than an interface?</para> <para>You may want to supply code examples to explain your argumentation.</para> </question> <answer> <para>Figure <xref linkend="jdbcArchitecture"/> tells us about the vendor independent architecture of <trademark xlink:href="http://electronics.zibb.com/trademark/jdbc/29545026">JDBC</trademark>. Oracle for example may implement a class <code>com.oracle.jdbc.OracleConnection</code>:</para> <programlisting annotations="nojavadoc" language="java">package com.oracle.jdbc; import java.sql.Connection; import java.sql.Statement; import java.sql.SQLException; public class OracleConnection implements Connection { ... Statement createStatement(int resultSetType, int resultSetConcurrency) throws SQLException) { // Implementation omitted here due to // limited personal hacking capabilities ... } ... }</programlisting> <para>If a programmer only uses the <trademark xlink:href="http://electronics.zibb.com/trademark/jdbc/29545026">JDBC</trademark> interfaces rather than a vendor's classes it is much easier to make the resulting application work with different databases from other vendors. This way a company's implementation is not exposed to our own <link linkend="gloss_Java"><trademark>Java</trademark></link> code.</para> <para>Regarding the special role of <classname>java.sql.DriverManager</classname> we notice the need of a starting point: We have to create an initial instance of some class. In theory (<emphasis role="bold">BUT NOT IN PRACTICE!!!</emphasis>) the following (ugly code) might be possible:</para> <programlisting language="none">package my.personal.application; import java.sql.Connection; import java.sql.Statement; import java.sql.SQLException; public someClass { public void someMethod(){ Connection conn = <emphasis role="bold">new OracleConnection()</emphasis>; // bad idea! ... } ... }</programlisting> <para>The problem with this approach is the explicit constructor call: Whenever we want to use another database we have two possibilities:</para> <itemizedlist> <listitem> <para>Rewrite our code.</para> </listitem> <listitem> <para>Introduce some sort of switch statement to provide a fixed number of databases beforehand:</para> <programlisting language="none">public void someMethod(final String vendor){ final Connection conn; switch(vendor) { case "ORACLE": conn = new OracleConnection(); break; case "DB2": conn = new Db2Connection(); break; default: conn = null; break; } ... }</programlisting> <para>Adding a new database still requires code rewriting.</para> </listitem> </itemizedlist> </answer> </qandaentry> </qandadiv> </qandaset> <qandaset defaultlabel="qanda" xml:id="quandaentry_DriverDispatch"> <title>Driver dispatch mechanism</title> <qandadiv> <qandaentry> <question> <para>In exercise <xref linkend="exerciseJdbcWhyInterface"/> we saw a hypothetic way to resolve the interface/class resolution problem by using a switch clause. How is this <code>switch</code> clause's logic actually realized in a <trademark xlink:href="http://electronics.zibb.com/trademark/jdbc/29545026">JDBC</trademark> based application? (<quote>behind the scenes</quote>)</para> <para>Hint: Read the documentation of <classname>java.sql.DriverManager</classname>.</para> </question> <answer> <para>Prior to opening a Connection a <trademark xlink:href="http://electronics.zibb.com/trademark/jdbc/29545026">JDBC</trademark> driver registers itself at the <classname>java.sql.DriverManager</classname> singleton instance. For this purpose the standard defined the method <link xlink:href="http://docs.oracle.com/javase/7/docs/api/java/sql/DriverManager.html#registerDriver(java.sql.Driver)">registerDriver(Driver)</link>. On success the <classname>java.sql.DriverManager</classname> adds the driver to an internal dictionary:</para> <informaltable border="1"> <col width="20%"/> <col width="30%"/> <tr> <th>protocol</th> <th>driver instance</th> </tr> <tr> <td>jdbc:mysql</td> <td>mysqlDriver instance</td> </tr> <tr> <td>jdbc:oracle</td> <td>oracleDriver instance</td> </tr> <tr> <td>...</td> <td>...</td> </tr> </informaltable> <para>So whenever the method <link xlink:href="http://docs.oracle.com/javase/7/docs/api/java/sql/DriverManager.html#getConnection(java.lang.String,%20java.lang.String,%20java.lang.String)">getConnection()</link> is being called the <classname>java.sql.DriverManager</classname> will scan the <trademark xlink:href="http://electronics.zibb.com/trademark/jdbc/29545026">JDBC</trademark> URL and isolate the protocol part. If we start with <code>jdbc:mysql://someserver.com:3306/someDatabase</code> this is just <code>jdbc:mysql</code>. The value is then being looked up in the above table of registered drivers to choose an appropriate instance or null otherwise. This way our hypothetic switch including the default value null is actually implemented.</para> </answer> </qandaentry> </qandadiv> </qandaset> </section> <section xml:id="propertiesFile"> <title>Connection properties</title> <para>So far our application depicted in <xref linkend="figJdbcSimpleWrite"/> suffers both from missing error handling and hard-coded parameters.</para> <para>Professional applications must be configurable. Changing the password currently requires source code modification and recompilation. <link linkend="gloss_Java"><trademark>Java</trademark></link> offers a standard procedure to externalize parameters like <varname>username</varname>, <varname>password</varname> an <trademark xlink:href="http://electronics.zibb.com/trademark/jdbc/29545026">JDBC</trademark> connection URL as being present in <xref linkend="figJdbcSimpleWrite"/>: We may externalize these parameters to external so called properties files:</para> <figure xml:id="propertyExternalization"> <title>Externalize a single string <code>"User name"</code> to a separate file <filename>message.properties</filename>.</title> <mediaobject> <imageobject> <imagedata fileref="Ref/Fig/externalize.fig"/> </imageobject> </mediaobject> </figure> <para>The current figure shows the externalization of just a single property. The file <filename>message.properties</filename> contains key-value pairs. The key <code>PropHello.uname</code> contains the value <code>User name</code>. Multiple strings may be externalized to the same properties file.</para> <para>Eclipse does have tool support for externalization. Simply hit Source --> Externalize Strings from the context menu. This activates a wizard to define property keys, renaming the generated helper class' name and finally create the actual <filename>message.properties</filename> file.</para> <qandaset defaultlabel="qanda" xml:id="quandaentry_WritProps"> <title>Moving <trademark xlink:href="http://electronics.zibb.com/trademark/jdbc/29545026">JDBC</trademark> <abbrev xlink:href="http://www.ietf.org/rfc/rfc1738.txt">URL</abbrev> and credentials to a property file</title> <qandadiv> <qandaentry> <question> <para>Start executing the code given in <xref linkend="figJdbcSimpleWrite"/>. Then extend this example by externalizing all <trademark xlink:href="http://electronics.zibb.com/trademark/jdbc/29545026">JDBC</trademark> related connection parameters to a <filename>jdbc.properties</filename> file like:</para> <programlisting language="none">SimpleInsert.jdbcUrl=jdbc:mysql://localhost:3306/hdm SimpleInsert.password=XYZ SimpleInsert.username=hdmuser</programlisting> <para>As being stated earlier the eclipse wizard assists you by generating both the properties file and a helper class reading that file at runtime.</para> </question> <answer> <para>The current exercise is mostly related to tooling. From our <link linkend="gloss_Java"><trademark>Java</trademark></link> code the context menu allows us to choose the desired wizard:</para> <informalfigure> <mediaobject> <imageobject> <imagedata fileref="Ref/Screen/externalize.screen.png"/> </imageobject> </mediaobject> </informalfigure> <para>We may now:</para> <itemizedlist> <listitem> <para>Select the strings to be externalized.</para> </listitem> <listitem> <para>Supply key names. In the subsequent screenshot this task has already been started by manually replacing the default <code>SimpleInsert.1</code> by <code>Simpleinsert.jdbc</code>.</para> </listitem> <listitem> <para>Redefine other parameters like prefix, properties file name etc. In the following screenshot only the first of three keys has been manually renamed to the sensible value <varname>SimpleInsert.jdbc</varname>.</para> </listitem> </itemizedlist> <informalfigure> <mediaobject> <imageobject> <imagedata fileref="Ref/Screen/externalize2.screen.png"/> </imageobject> </mediaobject> </informalfigure> <para>The wizard also generates a class <classname>sda.jdbc.intro.v1.DbProps</classname> to actually access our properties:</para> <programlisting language="none">package sda.jdbc.intro.v1; ... public class DbProps { private static final String BUNDLE_NAME = "sda.jdbc.intro.v1.database"; private static final ResourceBundle RESOURCE_BUNDLE = ResourceBundle .getBundle(BUNDLE_NAME); private DbProps() { } public static String getString(String key) { try { return RESOURCE_BUNDLE.getString(key); } catch (MissingResourceException e) { return '!' + key + '!'; } } }</programlisting> <para>Our <trademark xlink:href="http://electronics.zibb.com/trademark/jdbc/29545026">JDBC</trademark> related code now contains three references to external properties:</para> <programlisting language="none">package sda.jdbc.intro.v1; ... public class SimpleInsert { public static void main(String[] args) throws SQLException { // Step 1: Open a connection to the database server final Connection conn = DriverManager.getConnection ( <emphasis role="bold">DbProps.getString("PersistenceHandler.jdbcUrl"), </emphasis> <emphasis role="bold">DbProps.getString("PersistenceHandler.username")</emphasis>, <emphasis role="bold">DbProps.getString("PersistenceHandler.password")</emphasis>); // Step 2: Create a Statement instance final Statement stmt = conn.createStatement(); // Step 3: Execute the desired INSERT final int updateCount = stmt.executeUpdate( "INSERT INTO Person VALUES('Jim', 'jim@foo.org')"); // Step 4: Give feedback to the enduser System.out.println("Successfully inserted " + updateCount + " dataset(s)"); } }</programlisting> <para>The current base name <classname>sda.jdbc.intro.v1.PersistenceHandler</classname> is related to a later exercise.</para> </answer> </qandaentry> </qandadiv> </qandaset> </section> <section xml:id="sectSimpleInsertGui"> <title>A first GUI sketch</title> <para>So far all data records being transferred to the database server are still hard-coded in our application. In practice a user wants to enter data of persons to be submitted to the database.</para> <para>We now guide you to develop a first version of a simple GUI for this tasks. A more <link linkend="figureDataInsert2">elaborate version</link> will be presented in a follow-up exercise. The screenshot illustrates the intended application behaviour:</para> <figure xml:id="simpleInsertGui"> <title>A simple GUI to insert data into a database server.</title> <mediaobject> <imageobject> <imagedata fileref="Ref/Screen/simpleInsertGui.screen.png"/> </imageobject> <caption> <para>After clicking <quote>Insert</quote> a message is being presented to the user. This message may as well indicate a failure.</para> </caption> </mediaobject> </figure> <para>Implementing Swing GUI applications requires knowledge as being taught in e.g. <link xlink:href="http://www.hdm-stuttgart.de/studenten/stundenplan/vorlesungsverzeichnis/vorlesung_detail?vorlid=5212221">113300 Entwicklung von Web-Anwendungen</link>. If you do not (yet) feel comfortable writing <productname xlink:href="http://docs.oracle.com/javase/tutorial/uiswing/index.html">Swing</productname> applications you may want to read <uri xlink:href="http://www.javamex.com/tutorials/swing">http://www.javamex.com/tutorials/swing</uri> and <emphasis role="bold">really</emphasis> understand the examples being presented therein.</para> <qandaset defaultlabel="qanda" xml:id="quandaentry_GuiDb"> <title>GUI for inserting Person data to a database server</title> <qandadiv> <qandaentry> <question> <para>Write a GUI application as being outlined in <xref linkend="simpleInsertGui"/>. You may proceed as follows:</para> <orderedlist> <listitem> <para>Write a dummy GUI without any database functionality. Only present the two labels an input fields and the Insert button.</para> </listitem> <listitem> <para>Add an <classname>java.awt.event.ActionListener</classname> which generates a SQL INSERT Statement when clicking the Insert button. Return this string to the user as being shown in the message window of <xref linkend="simpleInsertGui"/>.</para> <para>At this point you still do not need a database connection. The message shown to the user is just a fake, so the GUI <emphasis role="bold">appears</emphasis> to be working.</para> </listitem> <listitem> <para>Establish a <classname>java.sql.Connection</classname> and create a <classname>java.sql.Statement</classname> instance when launching your application. Use the latter in your <classname>java.awt.event.ActionListener</classname> to actually insert datasets into your database.</para> </listitem> </orderedlist> </question> <answer> <para>The complete implementation resides in <classname>sda.jdbc.intro.v01.InsertPerson</classname>:</para> <programlisting language="none">package sda.jdbc.intro.v01; import ... public class InsertPerson extends JFrame { ... public InsertPerson () throws SQLException{ super ("Add a person's data"); setDefaultCloseOperation(JFrame.EXIT_ON_CLOSE); final JPanel databaseFieldPanel = new JPanel(); databaseFieldPanel.setLayout(new GridLayout(0,2)); add(databaseFieldPanel, BorderLayout.CENTER); databaseFieldPanel.add(new JLabel("Name:")); final JTextField nameField = new JTextField(15); databaseFieldPanel.add(nameField); databaseFieldPanel.add(new JLabel("E-mail:")); final JTextField emailField = new JTextField(15); databaseFieldPanel.add(emailField); final JButton insertButton = new JButton("Insert"); add(insertButton, BorderLayout.SOUTH); final Connection conn = DriverManager.getConnection( "jdbc:mysql://localhost:3306/hdm", "hdmuser", "XYZ"); final Statement stmt = conn.createStatement(); insertButton.addActionListener(new ActionListener() { // Linking the GUI to the database server. We assume an open // connection and a correctly initialized Statement instance @Override public void actionPerformed(ActionEvent event) { final String sql = "INSERT INTO Person VALUES('" + nameField.getText()+ "', '" + emailField.getText() + "')"; // We have to catch this Exception because an ActionListener's signature // prohibits the existence of a "throws" clause. try { final int updateCount = stmt.executeUpdate(sql); JOptionPane.showMessageDialog(null, "Successfully executed \n'" + sql + "'\nand inserted " + updateCount + " dataset"); } catch (SQLException e) { e.printStackTrace(); } } }); pack(); } }</programlisting> </answer> </qandaentry> </qandadiv> </qandaset> </section> <section xml:id="jdbcExceptions"> <title>Handling possible exceptions</title> <para>Our current code lacks any kind of error handling: Exceptions will not be caught at all and invariably lead to program termination. This is of course inadequate regarding professional software. In case of problems we have to:</para> <itemizedlist> <listitem> <para>Gracefully recover or shut down our application. We may for example show a pop up window <quote>Terminating due to an internal error</quote>.</para> </listitem> <listitem> <para>Enable the customer to supply the development team with helpful information. The user may for example be asked to submit a log file in case of errors.</para> </listitem> </itemizedlist> <para>In addition the solution <classname>sda.jdbc.intro.v01.InsertPerson</classname> contains an ugly mix of GUI components and database related code. We take a first step to decouple these two distinct concerns:</para> <qandaset defaultlabel="qanda" xml:id="quandaentry_DbLayer"> <title>Handling the database layer</title> <qandadiv> <qandaentry> <question> <para>Implement a class <code>PersistenceHandler</code> to be later used as a component of our next step GUI application prototype. This class should have the following methods:</para> <programlisting language="none">... /** * Handle database communication. There are two * distinct internal states <q>disconnected</q> and <q>connected</q>, see * {@link #isConnected()}. These two states may be toggled by invoking * {@link #connect()} and {@link #disconnect()} respectively. * * The following snippet illustrates the intended usage: * <pre> public static void main(String[] args) { final PersistenceHandler ph = new PersistenceHandler(); if (ph.connect()) { if (!ph.add("Jim", "jim@foo.com")) { System.err.println("Insert Error:" + ph.getErrorMessage()); } } else { System.err.println("Connect error:" + ph.getErrorMessage()); } }</pre> * * @author goik */ public class PersistenceHandler { ... /** * Instance in <q>disconnected</q> state. See {@link #isConnected()} */ public PersistenceHandler() {/* only present here to supply Javadoc comment */} /** * Inserting a (name, email) record into the database server. In case of * errors corresponding messages may subsequently be retrieved by calling * {@link #getErrorMessage()}. * * <dt><b>Precondition:</b></dt> <dd>must be in * <q>connected</q> state, see {@link #isConnected()}</dd> * * @param name * A person's name * @param email * A person's email address * * @return true if the current data record has been successfully inserted * into the database server. false in case of error(s). */ public boolean add(final String name, final String email){ ... } /** * Retrieving error messages in case a call to {@link #add(String, String)}, * {@link #connect()}, or {@link #disconnect()} yields an error. * * @return the error explanation corresponding to the latest failed * operation, null if no error yet occurred. */ public String getErrorMessage() { return ...; } /** * Open a connection to a database server. * * <dt><b>Precondition:</b><dd> * <dd>must be in <q>disconnected</q> state, see {@link #isConnected()}</dd> * * <dt><b>Precondition:</b><dd> * <dd>The following properties must be set: * <pre>PersistenceHandler.jdbcUrl=jdbc:mysql://localhost:3306/hdm PersistenceHandler.password=XYZ PersistenceHandler.username=foo</pre> * </dd> * * @return true if connecting was successful */ public boolean connect () { ... } /** * Close a connection to a database server and clean up JDBC related resources * * Error messages in case of failure may subsequently be retrieved by * calling {@link #getErrorMessage()}. * * <dt><b>Precondition:</b></dt> * <dd>must be in <q>connected</q> state, see {@link #isConnected()}</dd> * * @return true if disconnecting was successful, false in case error(s) occur. */ public boolean disconnect() { ... } /** * An instance can either be in <q>connected</q> or <q>disconnected</q> state. The * state can be toggled by invoking {@link #connect()} or * {@link #disconnect()} respectively. * * @return true if connected, false otherwise */ public boolean isConnected() { return ...; } }</programlisting> <para>Notice the two internal states <quote>disconnected</quote> and <quote>connected</quote>:</para> <figure xml:id="figPersistenceHandlerStates"> <title>Possible states and transitions for instances of <code>PersistenceHandler</code>.</title> <mediaobject> <imageobject> <imagedata fileref="Ref/Fig/persistHandlerStates.fig"/> </imageobject> </mediaobject> </figure> <para>According to the above documentation a newly created <code>PersistenceHandler</code> instance should be in disconnected state. As being shown in the <link linkend="gloss_Java"><trademark>Java</trademark></link> class description you may test your implementation without any GUI code. If you are already familiar with unit testing this might be a good start as well.</para> </question> <answer> <para>We show a possible implementation of <classname>sda.jdbc.intro.v1.PersistenceHandler</classname>:</para> <programlisting language="none">package sda.jdbc.intro.v1; ... public class PersistenceHandler { Connection conn = null; Statement stmt = null; String errorMessage = null; /** * New instances are in <q>disconnected</q> state. See {@link #isConnected()} */ public PersistenceHandler() {/* only present here to supply Javadoc comment */} /** * Inserting a (name, email) record into the database server. In case of * errors corresponding messages may subsequently be retrieved by calling * {@link #getErrorMessage()}. * * <dt><b>Precondition:</b></dt> <dd>must be in * <q>connected</q> state, see {@link #isConnected()}</dd> * * @param name * A person's name * @param email * A person's email address * * @return true if the current data record has been successfully inserted * into the database server. false in case of error(s). */ public boolean add(final String name, final String email){ final String sql = "INSERT INTO Person VALUES('" + name + "', '" + email + "')"; try { stmt.executeUpdate(sql); return true; } catch (SQLException e) { errorMessage = "Unable to execute '" + sql + "': '" + e.getMessage() + "'"; return false; } } /** * Retrieving error messages in case a call to {@link #add(String, String)}, * {@link #connect()}, or {@link #disconnect()} yields an error. * * @return the error explanation corresponding to the latest failed * operation, null if no error yet occurred. */ public String getErrorMessage() { return errorMessage; } /** * Open a connection to a database server. * * <dt><b>Precondition:</b><dd> * <dd>must be in <q>disconnected</q> state, see {@link #isConnected()}</dd> * * <dt><b>Precondition:</b><dd> * <dd>The following properties must be set: * <pre>PersistenceHandler.jdbcUrl=jdbc:mysql://localhost:3306/hdm PersistenceHandler.password=XYZ PersistenceHandler.username=foo</pre> * </dd> * * @return true if connecting was successful */ public boolean connect () { try { conn = DriverManager.getConnection( DbProps.getString("PersistenceHandler.jdbcUrl"), DbProps.getString("PersistenceHandler.username"), DbProps.getString("PersistenceHandler.password")); try { stmt = conn.createStatement(); return true; } catch (SQLException e) { errorMessage = "Connection opened but Statement creation failed:\"" + e.getMessage() + "\"."; try { conn.close(); } catch (SQLException ee) { errorMessage += "Closing connection failed:\"" + e.getMessage() + "\"."; } conn = null; } } catch (SQLException e) { errorMessage = "Unable to open connection:\"" + e.getMessage() + "\"."; } return false; } /** * Close a connection to a database server and clean up JDBC related resources * * Error messages in case of failure may subsequently be retrieved by * calling {@link #getErrorMessage()}. * * <dt><b>Precondition:</b></dt> * <dd>must be in <q>connected</q> state, see {@link #isConnected()}</dd> * * @return true if disconnecting was successful, false in case error(s) occur. */ public boolean disconnect() { boolean resultStatus = true; final StringBuffer messageCollector = new StringBuffer(); try { stmt.close(); } catch (SQLException e) { resultStatus = false; messageCollector.append("Unable to close Statement:\"" + e.getMessage() + "\"."); } stmt = null; try { conn.close(); } catch (SQLException e) { resultStatus = false; messageCollector.append("Unable to close connection:\"" + e.getMessage() + "\"."); } conn = null; if (!resultStatus) { errorMessage = messageCollector.toString(); } return resultStatus; } /** * An instance can either be in <q>connected</q> or <q>disconnected</q> state. The * state can be toggled by invoking {@link #connect()} or * {@link #disconnect()} respectively. * * @return true if connected, false otherwise */ public boolean isConnected() { return null != conn; } }</programlisting> </answer> </qandaentry> </qandadiv> </qandaset> <para>We may now complete the next enhancement step of our GUI database client.</para> <qandaset defaultlabel="qanda" xml:id="exerciseGuiWriteTakeTwo"> <title>Connection on user action</title> <qandadiv> <qandaentry> <question> <label>An application writing records to a database server</label> <para>Our aim is to enhance the first GUI prototype being described in <xref linkend="simpleInsertGui"/>. The application shall start being disconnected from the database server. Prior to entering data the user shall be guided to open a connection. The following video illustrates the desired user interface:</para> <figure xml:id="figureDataInsert2"> <title>A GUI frontend for adding personal data to a server.</title> <mediaobject> <videoobject> <videodata fileref="Ref/Video/dataInsert.mp4"/> </videoobject> </mediaobject> </figure> <para>In case a user closes the main window while still being connected a disconnect from the database server shall be enforced. For this purpose we must handle the event when the user clicks on the closing button within the window decoration. An exit handler method is being required to terminate a potentially open database connection.</para> </question> <answer> <para>Our implementation uses the class <classname>sda.jdbc.intro.v1.PersistenceHandler</classname> for handling all database communication. The GUI needs to visualize the two different states <quote>disconnected</quote> and <quote>connected</quote>. In <quote>disconnected</quote> state the whole input pane for entering datasets and clicking the <quote>Insert</quote> button is locked. So the user is forced to actively open a database connection.</para> <para>Notice also the <classname>java.awt.event.WindowAdapter</classname> implementation being executed when closing the application's main window. The <methodname>java.awt.event.WindowAdapter.windowClosing(java.awt.event.WindowEvent)</methodname> method disconnects any existing database connection thus freeing resources.</para> <programlisting language="none">package sda.jdbc.intro.v1; import ... public class InsertPerson extends JFrame { private static final long serialVersionUID = 6815975741605247675L; final PersistenceHandler persistenceHandler = new PersistenceHandler(); final JTextField nameField = new JTextField(15), emailField = new JTextField(20); final JButton toggleConnectButton = new JButton(), insertButton = new JButton("Insert"); final JPanel databaseFieldPanel = new JPanel(); private void setGuiConnectionState(final boolean state) { if (state) { toggleConnectButton.setText("Disconnect"); } else { toggleConnectButton.setText("Connect"); } for (final Component c: databaseFieldPanel.getComponents()){ c.setEnabled(state); } } public static void main(String[] args) throws SQLException { InsertPerson app = new InsertPerson(); app.setVisible(true); } public InsertPerson (){ super ("Add a person's data"); setSize(500, 500); addWindowListener(new WindowAdapter() { // In case a user closes our application window while still being connected // we have to close the database connection. @Override public void windowClosing(WindowEvent e) { super.windowClosing(e); if (persistenceHandler.isConnected() && !persistenceHandler.disconnect()) { System.exit(1); } else { System.exit(0); } }); Box top = Box.createHorizontalBox(); add(top, BorderLayout.NORTH); top.add(toggleConnectButton); toggleConnectButton.addActionListener(new ActionListener() { @Override public void actionPerformed(ActionEvent e) { if (persistenceHandler.isConnected()) { if (persistenceHandler.disconnect()){ setGuiConnectionState(false); } else { JOptionPane.showMessageDialog(null, persistenceHandler.getErrorMessage()); } } else { if (persistenceHandler.connect()){ setGuiConnectionState(true); } else { JOptionPane.showMessageDialog(null, persistenceHandler.getErrorMessage()); } } } }); databaseFieldPanel.setLayout(new GridLayout(0,2)); add(databaseFieldPanel); databaseFieldPanel.add(new JLabel("Name:")); databaseFieldPanel.add(nameField); databaseFieldPanel.add(new JLabel("E-mail:")); databaseFieldPanel.add(emailField); insertButton.addActionListener(new ActionListener() { @Override public void actionPerformed(ActionEvent e) { if (persistenceHandler.add(nameField.getText(), emailField.getText())) { nameField.setText(""); emailField.setText(""); JOptionPane.showMessageDialog(null, "Succesfully inserted dataset"); } else { JOptionPane.showMessageDialog(null, persistenceHandler.getErrorMessage()); } } }); databaseFieldPanel.add(Box.createGlue()); databaseFieldPanel.add(insertButton); setGuiConnectionState(false); pack(); } }</programlisting> </answer> </qandaentry> </qandadiv> </qandaset> </section> <section xml:id="jdbcSecurity"> <title><trademark xlink:href="http://electronics.zibb.com/trademark/jdbc/29545026">JDBC</trademark> and security</title> <section xml:id="jdbcSecurityNetwork"> <title>Network sniffing</title> <para>Sniffing <trademark xlink:href="http://electronics.zibb.com/trademark/jdbc/29545026">JDBC</trademark> network traffic is one possibility for intruders to compromise database applications. This requires physical access to either of:</para> <itemizedlist> <listitem> <para>Server host</para> </listitem> <listitem> <para>Client host</para> </listitem> <listitem> <para>intermediate hub, switch or router.</para> </listitem> </itemizedlist> <figure xml:id="figJdbcSniffing"> <title>Sniffing a <trademark xlink:href="http://electronics.zibb.com/trademark/jdbc/29545026">JDBC</trademark> connection by an intruder.</title> <mediaobject> <imageobject> <imagedata fileref="Ref/Fig/jdbcSniffing.fig"/> </imageobject> </mediaobject> </figure> <para>We demonstrate a possible attack by analyzing the network traffic between our application shown in <xref linkend="figJdbcSimpleWrite"/> and the <productname xlink:href="http://www.mysql.com">Mysql</productname> database server. Prior to starting the application we set up <productname xlink:href="http://www.wireshark.org">Wireshark</productname> for filtered capturing:</para> <itemizedlist> <listitem> <para>Connecting to the <varname>loopback</varname> (lo) interface only. This is sufficient since our client connects to <varname>localhost</varname>.</para> </listitem> <listitem> <para>Filtering packets if not of type <acronym xlink:href="http://en.wikipedia.org/wiki/Transmission_Control_Protocol">TCP</acronym> and having port number 3306</para> </listitem> </itemizedlist> <para>This yields the following capture being shortened for the sake of brevity:</para> <programlisting language="none">[... 5.5.24-0ubuntu0.12.04.1.%...X*e?I1ZQ...................e,F[yoA5$T[N.mysql_native_password. A...........!.......................hdmuser <co xml:id="tcpCaptureUsername"/>......U.>S.%..~h...!.xhdm............j..../* ... INSERT INTO Person VALUES('Jim', 'jim@foo.org') <co xml:id="tcpCaptureSqlInsert"/>6... .&.#23000Duplicate entry 'jim@foo.org' for key 'email' <co xml:id="tcpCaptureErrmsg"/></programlisting> <calloutlist> <callout arearefs="tcpCaptureUsername"> <para>The <varname>username</varname> initiating the connection to the database server.</para> </callout> <callout arearefs="tcpCaptureSqlInsert"> <para>The <code>INSERT ...</code> statement.</para> </callout> <callout arearefs="tcpCaptureErrmsg"> <para>The resulting error message being sent back to the client.</para> </callout> </calloutlist> <para>Something seems to be missing here: The user's password. Our code in <xref linkend="figJdbcSimpleWrite"/> contains the password <quote><varname>XYZ</varname></quote> in clear text. But even using the search function of <productname xlink:href="http://www.wireshark.org">Wireshark</productname> does not show any such string within the above capture. The <productname xlink:href="http://www.mysql.com">Mysql</productname> documentation however <link xlink:href="http://dev.mysql.com/doc/refman/5.0/en/security-against-attack.html">reveals</link> that everything but the password is transmitted in clear text. So all we might identify is a hash of <code>XYZ</code>.</para> <para>So regarding our (current) <productname xlink:href="http://www.mysql.com">Mysql</productname> implementation the impact of this attack type is somewhat limited but still severe: All data being transmitted between client and server may be disclosed. This typically comprises sensible data as well. Possible solutions:</para> <itemizedlist> <listitem> <para>Create an encrypted tunnel between client and server like e.g. <link xlink:href="http://www.debianadmin.com/howto-use-ssh-local-and-remote-port-forwarding.html">ssh port forwarding</link> or <link xlink:href="http://de.wikipedia.org/wiki/Virtual_Private_Network">VPN</link>.</para> </listitem> <listitem> <para>Many database vendors <link xlink:href="http://dev.mysql.com/doc/refman/5.1/de/connector-j-reference-using-ssl.html">supply SSL</link> or similar <trademark xlink:href="http://electronics.zibb.com/trademark/jdbc/29545026">JDBC</trademark> protocol encryption extensions. This requires additional configuration procedures like setting up server side certificates. Moreover similar to the http/https protocols encryption generally slows down data traffic.</para> </listitem> </itemizedlist> <para>Of course this is only relevant if the transport layer is considered to be insecure. If both server and client reside within the same trusted infrastructure no action has to be taken. We also note that this kind of problem is not limited to <trademark xlink:href="http://electronics.zibb.com/trademark/jdbc/29545026">JDBC</trademark>. In fact all protocols lacking encryption are subject to this type of attack.</para> </section> <section xml:id="sqlInjection"> <title>SQL injection</title> <para>Before diving into technical details we shed some light on the possible impact of this common attack type being described in this chapter. Our example is the well known Heartland Payment Systems data breach:</para> <figure xml:id="figHeartlandSecurityBreach"> <title>Summary about possible SQL injection impact based on the Heartland security breach</title> <mediaobject> <imageobject> <imagedata fileref="Ref/Fig/heartland.fig"/> </imageobject> </mediaobject> </figure> <para>Why should we be concerned with SQL injection? In the introduction of <xref linkend="bib_Clarke09"/> a compelling argument is being given:</para> <blockquote> <para>Many people say they know what SQL injection is, but all they have heard about or experienced are trivial examples. SQL injection is one of the most devastating vulnerabilities to impact a business, as it can lead to exposure of all of the sensitive information stored in an application's database, including handy information such as usernames, passwords, names, addresses, phone numbers, and credit card details.</para> </blockquote> <para>In this lecture due to limited resources we only deal with trivial examples mentioned above. One possible way SQL injection attacks work is by inserting SQL code into fields being designed for end user input:</para> <figure xml:id="figSqlInject"> <title>SQL injection triggered by ordinary user input.</title> <mediaobject> <imageobject> <imagedata fileref="Ref/Fig/sqlinject.fig"/> </imageobject> </mediaobject> </figure> <qandaset defaultlabel="qanda" xml:id="sqlInjectDropTable"> <title>Attack from the dark side</title> <qandadiv> <qandaentry> <question> <para>Use the application from <xref linkend="exerciseGuiWriteTakeTwo"/> and <xref linkend="figSqlInject"/> to launch a SQL injection attack. We provide some hints:</para> <orderedlist> <listitem> <para>The <productname xlink:href="http://www.mysql.com">Mysql</productname> <trademark xlink:href="http://electronics.zibb.com/trademark/jdbc/29545026">JDBC</trademark> driver implementation already provides precautions to hamper SQL injection attacks. In its default configuration a sequence of SQL commands separated by semicolons (<quote>;</quote>) will not be executed but flagged as a SQL syntax error. We take an example:</para> <programlisting language="none">INSERT INTO Person VALUES (...);DROP TABLE Person</programlisting> <para>In order to execute these so called multi user queries we explicitly have to enable a <productname xlink:href="http://www.mysql.com">Mysql</productname> property. This may be achieved by extending our <trademark xlink:href="http://electronics.zibb.com/trademark/jdbc/29545026">JDBC</trademark> URL:</para> <programlisting language="none">jdbc:mysql://localhost:3306/hdm?<emphasis role="bold">allowMultiQueries=true</emphasis></programlisting> <para>The <productname xlink:href="http://www.mysql.com">Mysql</productname> manual <link xlink:href="http://dev.mysql.com/doc/refman/5.1/en/connector-j-reference-configuration-properties.html">contains </link>a remark regarding this parameter:</para> <remark>Notice that this has the potential for SQL injection if using plain java.sql.Statements and your code doesn't sanitize input correctly.</remark> <para>In other words: You have been warned!</para> </listitem> <listitem> <para>You may now use either of the two input fields <quote>name</quote> or <quote>email</quote> to inject arbitrary SQL code.</para> </listitem> </orderedlist> </question> <answer> <para>We construct a suitable string being injected to drop our <code>Person</code> table:</para> <programlisting language="none">Jim', 'jim@c.com');DROP TABLE Person;INSERT INTO Person VALUES('Joe</programlisting> <para>This being entered into the name field kills our <code>Table</code> relation effectively. As the error message shows two INSERT statements are separated by a DROP TABLE statement. So after executing the first INSERT our database server drops the whole table. At last the second INSERT statement fails giving rise to an error message no end user will ever understand:</para> <figure xml:id="figSqlInjectDropPerson"> <title>Dropping the <code>Person</code> table by SQL injection</title> <mediaobject> <imageobject> <imagedata fileref="Ref/Screen/sqlInject.screen.png"/> </imageobject> </mediaobject> </figure> <para>According to the message text the table <code>Person</code> gets dropped as expected. Thus the subsequent (second) <code>INSERT</code> action is bound to fail.</para> <para>In practice this result my be avoided. The database user will (hopefully!) not have sufficient permissions to drop the whole table. Malicious modifications by INSERT, UPDATE or DELETE statements are still possible.</para> </answer> </qandaentry> </qandadiv> </qandaset> </section> <section xml:id="sanitizeUserInput"> <title>Sanitizing user input</title> <para>There are at least two general ways to deal with the disastrous result of <xref linkend="sqlInjectDropTable"/>:</para> <itemizedlist> <listitem> <para>Keep the database server from interpreting user input completely. This is probably the best way and will be discussed in <xref linkend="sectPreparedStatements"/>.</para> </listitem> <listitem> <para>Let the application check and process user input. Dangerous user input may be modified prior to being embedded in SQL statements or being rejected completely.</para> </listitem> </itemizedlist> <para>The first method is definitely superior in most cases. There are however cases where the restrictions being implied are too severe. We may for example choose dynamically which tables shall be accessed. So an SQL statement's structure rather than just its predicates is affected by user input. There are at least two standard procedures dealing with this problem:</para> <glosslist> <glossentry> <glossterm>Input Filtering</glossterm> <glossdef> <para>In the simplest case we check a user's input by regular expressions. An example is an input field in a login window representing a system user name. Legal input may allows letters and digits only. Special characters, whitespace etc. are typically prohibited. The input does have a minimum length of one character. A maximum length may be imposed as well. So we may choose the regular expression <code>[A-Za-z0-9]+</code> to check valid user names.</para> </glossdef> </glossentry> <glossentry> <glossterm><foreignphrase>Whitelisting</foreignphrase></glossterm> <glossdef> <para>In many cases Input fields only allow a restricted set of values. Consider an input field for names of planets. An application may keep a dictionary table to validate user input:</para> <informaltable border="1"> <col width="10%"/> <col width="5%"/> <tr> <td>Mercury</td> <td>1</td> </tr> <tr> <td>Venus</td> <td>2</td> </tr> <tr> <td>Earth</td> <td>3</td> </tr> <tr> <td>...</td> <td>...</td> </tr> <tr> <td>Neptune</td> <td>9</td> </tr> <tr> <td><emphasis role="bold">Default:</emphasis></td> <td><emphasis role="bold">0</emphasis></td> </tr> </informaltable> <para>So if a user enters a valid planet name a corresponding number representing this particular planet will be sent to the database. If the user enters an invalid string an error message may be raised.</para> <para>In a GUI in many situations this may be better accomplished by presenting the list of planets to choose from. In this case a user has no chance to enter invalid or even malicious code.</para> </glossdef> </glossentry> </glosslist> <para>So we have an <quote>interceptor</quote> sitting between user input fields and SQL generating code:</para> <figure xml:id="figInputFiltering"> <title>Validating user input prior to dynamically composing SQL statements.</title> <mediaobject> <imageobject> <imagedata fileref="Ref/Fig/filtering.fig"/> </imageobject> </mediaobject> </figure> <qandaset defaultlabel="qanda" xml:id="quandaentry_RegexpUse"> <title>Using regular expressions in <link linkend="gloss_Java"><trademark>Java</trademark></link></title> <qandadiv> <qandaentry> <question> <para>This exercise is a preparation for <xref linkend="exercisefilterUserInput"/>. The aim is to deal with regular expressions and to use them in <link linkend="gloss_Java"><trademark>Java</trademark></link>. If you don't know yet about regular expressions / pattern matching you may want to read either of:</para> <itemizedlist> <listitem> <para><link xlink:href="http://www.aivosto.com/vbtips/regex.html">Regular expressions - An introduction</link></para> </listitem> <listitem> <para><link xlink:href="http://www.codeproject.com/Articles/939/An-Introduction-to-Regular-Expressions">An Introduction to Regular Expressions</link></para> </listitem> <listitem> <para><link xlink:href="http://www.regular-expressions.info/tutorial.html">Regular Expression Tutorial</link></para> </listitem> </itemizedlist> <para>Complete the implementation of the following skeleton:</para> <programlisting language="none">... import java.util.regex.Matcher; import java.util.regex.Pattern; public static void main(String[] args) { final String [] wordList = new String [] {"Eric", "126653BBb", "_login","some text"}; final String [] regexpList = new String[] {"[A-K].*", "[^0-9]+.*", "_[a-z]+", ""}; for (final String word: wordList) { for (final String regexp: regexpList) { testMatch(word, regexp); } } } /** * Matching a given word by a regular expression. A log message is being * written to stdout. * * Hint: The implementation is based on the explanation being given in the * introduction to {@link Pattern} * * @param word This string will be matched by the subsequent argument. * @param regexp The regular expression tested to match the previous argument. * @return true if regexp matches word, false otherwise. */ public static boolean testMatch(final String word, final String regexp) { .../* to be implemented by <emphasis role="bold">**YOU**</emphasis> */ }</programlisting> <para>As being noted in the <link linkend="gloss_Java"><trademark>Java</trademark></link> above you may want to read the documentation of class <classname>java.util.regex.Pattern</classname>. The intended output of the above application is:</para> <programlisting language="none">The expression '[A-K].*' matches 'Eric' The expression '[^0-9]+.*' ... ...</programlisting> </question> <answer> <para>A possible implementation is given by <classname>sda.regexp.RegexpPrimer</classname>.</para> </answer> </qandaentry> </qandadiv> </qandaset> <qandaset defaultlabel="qanda" xml:id="exercisefilterUserInput"> <title>Input validation by regular expressions</title> <qandadiv> <qandaentry> <question> <para>The application of <xref linkend="sqlInjectDropTable"/> proved to be vulnerable to SQL injection. Sanitize the two user input field's values to prevent such behaviour.</para> <itemizedlist> <listitem> <para>Find appropriate regular expressions to check both username and email. Some hints:</para> <glosslist> <glossentry> <glossterm>username</glossterm> <glossdef> <para>Regarding SQL injection the <quote>;</quote> character is among the most critical. You may want to exclude certain special characters. This doesn't harm since their presence in a user's name is likely to be a typo rather then any sensitive input.</para> </glossdef> </glossentry> <glossentry> <glossterm>email</glossterm> <glossdef> <para>There are tons of <quote>ultimate</quote> regular expressions available to check email addresses. Remember that rather avoiding <quote>wrong</quote> email addresses the present task is to avoid SQL injection. So find a reasonable one which may be too permissive regarding RFC email syntax rules but sufficient to secure your application.</para> <para>A concise definition of an email's syntax is being given in <link xlink:href="http://tools.ietf.org/html/rfc5322#section-3.4.1">RFC5322</link>. Its implementation is beyond scope of the current lecture. Moreover it is questionable whether E-mail clients and mail transfer agents implement strict RFC compliance.</para> </glossdef> </glossentry> </glosslist> <para>Both regular expressions must cover the whole user input from the beginning to the end. This can be achieved by using <code>^ ... $</code>.</para> </listitem> <listitem> <para>The <link linkend="gloss_Java"><trademark>Java</trademark></link> standard class <classname>javax.swing.InputVerifier</classname> may help you validating user input.</para> </listitem> <listitem> <para>The following screenshot may provide an idea for GUI realization and user interaction in case of errors. Of course the submit button's action should be disabled in case of erroneous input. The user should receive a helpful error message instead.</para> <figure xml:id="figInsertValidate"> <title>Error message being presented to the user.</title> <mediaobject> <imageobject> <imagedata fileref="Ref/Screen/insertValidate.screen.png"/> </imageobject> <caption> <para>In the current example the trailing <quote>;</quote> within the E-Mail field is invalid.</para> </caption> </mediaobject> </figure> </listitem> </itemizedlist> </question> <answer> <para>Extending <classname>javax.swing.InputVerifier</classname> allows us to build a generic class to filter user text input by arbitrary regular expressions:</para> <programlisting language="none">package sda.jdbc.intro.v1.sanitize; ... public class RegexpVerifier extends InputVerifier { final Pattern syntaxPattern; final JLabel validationLabel; private boolean inputValid = false; private final String errMsg; ... public RegexpVerifier (final String regex, final JLabel validationLabel, final String errMsg) { this.validationLabel = validationLabel; this.errMsg = errMsg; syntaxPattern = Pattern.compile(regex); } @Override public boolean verify(JComponent input) { if (input instanceof JTextField) { final String userInput = ((JTextField) input).getText(); if (syntaxPattern.matcher(userInput).find()) { validationLabel.setText(""); inputValid = true; } else { validationLabel.setText(errMsg); inputValid = false; } } return inputValid; } public boolean inputIsValid () { return inputValid; } }</programlisting> <para>Instances of <classname>sda.jdbc.intro.v1.sanitize.RegexpVerifier</classname> <coref linkend="emailVerifier"/> <coref linkend="nameVerifier"/> may now be used to validate our two input data fields <coref linkend="setNameValidation"/> <coref linkend="setEmailValidation"/>. We put emphasis on the changes with respect to <classname>sda.jdbc.intro.v1.InsertPerson</classname>:</para> <programlisting language="none">package sda.jdbc.intro.v1.sanitize; ... public class InsertPerson extends JFrame { final JTextField nameField = new JTextField(15); final JLabel nameFieldValidationLabel <co xml:id="nameVerifier"/> = new JLabel(); final RegexpVerifier nameFieldVerifier = new RegexpVerifier( "^[^;'\"]+$", nameFieldValidationLabel, "No special characters"); final JTextField emailField = new JTextField(20); final JLabel emailFieldValidationLabel <co xml:id="emailVerifier"/> = new JLabel(); final RegexpVerifier emailFieldVerifier = new RegexpVerifier("^[\\w\\-\\.\\_]+@[\\w\\-\\.]*[a-zA-Z]{2,4}$", emailFieldValidationLabel, "email not valid"); ... public static void main(String[] args) throws SQLException { InsertPerson app = new InsertPerson(); app.setVisible(true); } public InsertPerson (){ ... databaseFieldPanel.add(nameField); <emphasis role="bold">nameFieldValidationLabel.setForeground(Color.RED); databaseFieldPanel.add(nameFieldValidationLabel); nameField.setInputVerifier(nameFieldVerifier);</emphasis> <co xml:id="setNameValidation"/> databaseFieldPanel.add(new JLabel("E-mail:")); databaseFieldPanel.add(emailField); <emphasis role="bold">databaseFieldPanel.add(emailFieldValidationLabel); emailFieldValidationLabel.setForeground(Color.RED); emailField.setInputVerifier(emailFieldVerifier);</emphasis> <co xml:id="setEmailValidation"/> insertButton.addActionListener(new ActionListener() { @Override public void actionPerformed(ActionEvent e) { <emphasis role="bold">if (!nameFieldVerifier.inputIsValid() || !emailFieldVerifier.inputIsValid()) { JOptionPane.showMessageDialog(null, "Invalid input value(s)"); }</emphasis> else { ...</programlisting> </answer> </qandaentry> </qandadiv> </qandaset> </section> <section xml:id="sectPreparedStatements"> <title><classname>java.sql.PreparedStatement</classname> objects</title> <para>Sanitizing user input is an essential means to secure an application. The <trademark xlink:href="http://electronics.zibb.com/trademark/jdbc/29545026">JDBC</trademark> standard however provides a mechanism being superior regarding the purpose of protecting applications against SQL injection attacks. We shed some light on our current mechanism sending SQL statements to a database server:</para> <figure xml:id="sqlTransport"> <title>SQL statements in <link linkend="gloss_Java"><trademark>Java</trademark></link> applications get parsed at the database server</title> <mediaobject> <imageobject> <imagedata fileref="Ref/Fig/sqlTransport.fig"/> </imageobject> </mediaobject> </figure> <para>This architecture raises two questions:</para> <orderedlist> <listitem> <para>What happens in case identical SQL statements are executed repeatedly? This may happen inside a loop when thousands of records with identical structure are being sent to a database.</para> </listitem> <listitem> <para>Is this architecture adequate with respect to security concerns?</para> </listitem> </orderedlist> <para>The first question is related to performance: Parsing statements being identical despite the properties being contained within is a waste of resources. We consider the transfer of records between different databases:</para> <programlisting language="none">INSERT INTO Person VALUES ('Jim', 'jim@q.org') INSERT INTO Person VALUES ('Eve', 'eve@y.org') INSERT INTO Person VALUES ('Pete', 'p@rr.com') ...</programlisting> <para>In this case it does not make sense to repeatedly parse identical SQL statements. Using single <code>INSERT</code> statements with multiple data records may not be an option when the number of records grows.</para> <para>The second question is related to our current security topic: The database server's interpreter my be so <quote>kind</quote> to interpret an attacker's malicious code as well.</para> <para>Both topics are being addressed by <classname>java.sql.PreparedStatement</classname> objects. Basically these objects allow for separation of an SQL statements structure from parameter values contained within. The scenario given in <xref linkend="sqlTransport"/> may be implemented as:</para> <figure xml:id="sqlTransportPrepare"> <title>Using <classname>java.sql.PreparedStatement</classname> objects.</title> <mediaobject> <imageobject> <imagedata fileref="Ref/Fig/sqlTransportPrepare.fig"/> </imageobject> </mediaobject> </figure> <para>Prepared statements are an example for parameterized SQL statements which exist in various programming languages. When using <classname>java.sql.PreparedStatement</classname> instances we actually have three distinct phases:</para> <orderedlist> <listitem> <para xml:id="exerciseGuiWritePrepared">Creating an instance of <classname>java.sql.PreparedStatement</classname>. The SQL statement possibly containing place holders gets parsed.</para> </listitem> <listitem> <para>Setting all placeholder values. This does not involve any further SQL syntax parsing.</para> </listitem> <listitem> <para>Execute the statement.</para> </listitem> </orderedlist> <para>Steps 2. and 3. may be repeated as often as desired without any re-parsing of SQL statements thus saving resources on the database server side.</para> <para>Our introductory toy application <xref linkend="figJdbcSimpleWrite"/> may be rewritten using <classname>java.sql.PreparedStatement</classname> objects:</para> <programlisting language="none">sda.jdbc.intro.v1; ... public class SimpleInsert { public static void main(String[] args) throws SQLException { final Connection conn = DriverManager.getConnection (... // Step 2: Create a PreparedStatement instance final PreparedStatement pStmt = conn.prepareStatement( "INSERT INTO Person VALUES(<emphasis role="bold">?, ?</emphasis>)");<co xml:id="listPrepCreate"/> // Step 3a: Fill in desired attribute values pStmt.setString(1, "Jim");<co xml:id="listPrepSet1"/> pStmt.setString(2, "jim@foo.org");<co xml:id="listPrepSet2"/> // Step 3b: Execute the desired INSERT final int updateCount = pStmt.executeUpdate();<co xml:id="listPrepExec"/> // Step 4: Give feedback to the enduser System.out.println("Successfully inserted " + updateCount + " dataset(s)"); } }</programlisting> <calloutlist> <callout arearefs="listPrepCreate"> <para>An instance of <classname>java.sql.PreparedStatement</classname> is being created. Notice the two question marks representing two place holders for string values to be inserted in the next step.</para> </callout> <callout arearefs="listPrepSet1 listPrepSet2"> <para>Fill in the two placeholder values being defined at <coref linkend="listPrepCreate"/>.</para> <caution> <para>Since half the world of programming folks will index a list of n elements starting from 0 to n-1, <trademark xlink:href="http://electronics.zibb.com/trademark/jdbc/29545026">JDBC</trademark> apparently counts from 1 to n. Working with <trademark xlink:href="http://electronics.zibb.com/trademark/jdbc/29545026">JDBC</trademark> would have been too easy otherwise.</para> </caution> </callout> <callout arearefs="listPrepExec"> <para>Execute the beast! Notice the empty parameter list. No SQL is required since we already prepared it in <coref linkend="listPrepCreate"/>.</para> </callout> </calloutlist> <para>The problem of SQL injection disappears completely when using <classname>java.sql.PreparedStatement</classname> instances. An attacker may safely enter offending strings like:</para> <programlisting language="none">Jim', 'jim@c.com');DROP TABLE Person;INSERT INTO Person VALUES('Joe</programlisting> <para>The above string will be taken <quote>as is</quote> and thus simply becomes part of the database server's content.</para> <qandaset defaultlabel="qanda" xml:id="exerciseSqlInjectPrepare"> <title>Prepared Statements to keep the barbarians at the gate</title> <qandadiv> <qandaentry> <question> <para>In <xref linkend="sqlInjectDropTable"/> we found our implementation in <xref linkend="exerciseGuiWriteTakeTwo"/> to be vulnerable with respect to SQL injection. Rather than sanitizing user input you shall use <classname>java.sql.PreparedStatement</classname> objects to secure the application.</para> </question> <answer> <para>Due to our separation of GUI and persistence handling we only need to re-implement <classname>sda.jdbc.intro.sqlinject.PersistenceHandler</classname>. We have to replace <classname>java.sql.Statement</classname> by <classname>java.sql.PreparedStatement</classname> instances. A possible implementation is <classname>sda.jdbc.intro.v1.prepare.PersistenceHandler</classname>. We may now safely enter offending strings like:</para> <programlisting language="none">Jim', 'jim@c.com');DROP TABLE Person;INSERT INTO Person VALUES('Joe</programlisting> <para>This time the input value is taken <quote>as is</quote> and yields the following error message:</para> <informalfigure> <mediaobject> <imageobject> <imagedata fileref="Ref/Screen/sqlInjectPrepare.screen.png"/> </imageobject> </mediaobject> </informalfigure> <para>The offending string exceeds the length of the attribute <code>name</code> within the database table <code>Person</code>. We may enlarge this value to allow the <code>INSERT</code> operation:</para> <programlisting language="none">CREATE TABLE Person ( name char(<emphasis role="bold">80</emphasis>) <emphasis role="bold">-- a little bit longer --</emphasis> ,email CHAR(20) UNIQUE );</programlisting> </answer> </qandaentry> </qandadiv> </qandaset> <para>We may have followed the track of test-driven development. In that case we would have written tests before actually implementing our application. In the current lecture we will do this the other way round in the following exercise. The idea is to assure software quality when fixing bugs or extending an application.</para> <para>The subsequent exercise requires the <productname xlink:href="http://testng.org/doc/eclipse.html#eclipse-installation">TestNG</productname> plugin for Eclipse to be installed. This should already be the case both in the MI exercise classrooms and in the Virtualbox image provided at <uri xlink:href="ftp://mirror.mi.hdm-stuttgart.de/ubuntu/VirtualBox/lubuntu.vdi">ftp://mirror.mi.hdm-stuttgart.de/ubuntu/VirtualBox/lubuntu.vdi</uri>. If you use a private Eclipse installation you may want to follow <xref linkend="testngInstall"/>.</para> <qandaset defaultlabel="qanda" xml:id="quandaentry_DbLayerUnitTest"> <title>Testing <classname>sda.jdbc.intro.v1.PersistenceHandler</classname> using <productname xlink:href="http://testng.org">TestNG</productname></title> <qandadiv> <qandaentry> <question> <para>Read <xref linkend="chapUnitTesting"/>. Then test:</para> <itemizedlist> <listitem> <para>Proper behaviour when opening and closing connections.</para> </listitem> <listitem> <para>Proper behavior when inserting data</para> </listitem> <listitem> <para>Expected behaviour when entering duplicate values violating integrity constraints. Look for error messages as well.</para> </listitem> </itemizedlist> <para>You may write code to initialize the database state appropriately prior to start tests.</para> </question> <answer> <para><productname xlink:href="http://testng.org">TestNG</productname> may be directed by <classname>sda.jdbc.intro.v1.prepare.PersistenceHandlerTest</classname>.</para> </answer> </qandaentry> </qandadiv> </qandaset> </section> </section> <section xml:id="jdbcRead"> <title>Read Access</title> <para>So far we've sent records to a database server. Applications however need both directions: Pushing data to a Server and receiving data as well. The overall process looks like:</para> <figure xml:id="jdbcReadWrite"> <title>Server / client object's life cycle</title> <mediaobject> <imageobject> <imagedata fileref="Ref/Fig/jdbcReadWrite.fig"/> </imageobject> </mediaobject> </figure> <para>So far we've only covered the second (<code>UPDATE</code>) part of this picture. Reading objects from a database server into a client's (transient) address space requires a container object to hold the data in question. Though <link linkend="gloss_Java"><trademark>Java</trademark></link> offers standard container interfaces like <classname>java.util.List</classname> the <trademark xlink:href="http://electronics.zibb.com/trademark/jdbc/29545026">JDBC</trademark> standard has created separate specifications like <classname>java.sql.ResultSet</classname>. Instances of <classname>java.sql.ResultSet</classname> will hold transient copies of (database) objects. The next figure outlines the basic approach:</para> <figure xml:id="figJdbcRead"> <title>Reading data from a database server.</title> <mediaobject> <imageobject> <imagedata fileref="Ref/Fig/jdbcread.fig" scale="65"/> </imageobject> </mediaobject> </figure> <para>We take an example. Suppose our database contains a table of our friends' nicknames and their respective birth dates:</para> <table border="1" xml:id="figRelationFriends"> <caption>Names and birth dates of friends.</caption> <tr> <td><programlisting language="none">CREATE TABLE Friends ( id INTEGER NOT NULL PRIMARY KEY ,nickname char(10) ,birthdate DATE );</programlisting></td> <td><programlisting language="none">INSERT INTO Friends VALUES (1, 'Jim', '1991-10-10') ,(2, 'Eve', '2003-05-24') ,(3, 'Mick','2001-12-30') ;</programlisting></td> </tr> </table> <para>Following the outline in <xref linkend="figJdbcRead"/> we may access our data by:</para> <figure xml:id="listingJdbcRead"> <title>Accessing relational data</title> <programlisting language="none">package sda.jdbc.intro; ... public class SimpleRead { public static void main(String[] args) throws SQLException { // Step 1: Open a connection to the database server final Connection conn = DriverManager.getConnection ( DbProps.getString("PersistenceHandler.jdbcUrl"), DbProps.getString("PersistenceHandler.username"), DbProps.getString("PersistenceHandler.password")); // Step 2: Create a Statement instance final Statement stmt = conn.createStatement(); <emphasis role="bold">// Step 3: Creating the client side JDBC container holding our data records</emphasis> <emphasis role="bold">final ResultSet data = stmt.executeQuery("SELECT * FROM Friends");</emphasis> <co linkends="listingJdbcRead-1" xml:id="listingJdbcRead-1-co"/> <emphasis role="bold">// Step 4: Dataset iteration while (data.next()) {</emphasis> <co linkends="listingJdbcRead-2" xml:id="listingJdbcRead-2-co"/> <emphasis role="bold">System.out.println(data.getInt("id")</emphasis> <co linkends="listingJdbcRead-3" xml:id="listingJdbcRead-3-co"/> <emphasis role="bold">+ ", " + data.getString("nickname")</emphasis> <co linkends="listingJdbcRead-3" xml:id="listingJdbcRead-4-co"/> <emphasis role="bold">+ ", " + data.getString("birthdate"));</emphasis> <co linkends="listingJdbcRead-3" xml:id="listingJdbcRead-5-co"/> } } }</programlisting> </figure> <para>The marked code segment above shows difference with respect to our data insertion application <classname>sda.jdbc.intro.SimpleInsert</classname>. Some remarks are in order:</para> <calloutlist> <callout arearefs="listingJdbcRead-1-co" xml:id="listingJdbcRead-1"> <para>As being mentioned in the introduction to this section the <trademark xlink:href="http://electronics.zibb.com/trademark/jdbc/29545026">JDBC</trademark> standard comes with its own container interface rather than <classname>java.util.List</classname> or similar.</para> </callout> <callout arearefs="listingJdbcRead-2-co" xml:id="listingJdbcRead-2"> <para>Calling <link xlink:href="http://docs.oracle.com/javase/7/docs/api/java/sql/ResultSet.html#next()">next()</link> prior to actually accessing data on the client side is mandatory! The <link xlink:href="http://docs.oracle.com/javase/7/docs/api/java/sql/ResultSet.html#next()">next()</link> method places the internal iterator to the first element of our dataset if not empty. Follow the link address and **read** the documentation.</para> </callout> <callout arearefs="listingJdbcRead-3-co listingJdbcRead-4-co listingJdbcRead-5-co" xml:id="listingJdbcRead-3"> <para>The access methods have to be chosen according to matching types. An overview of database/<link linkend="gloss_Java"><trademark>Java</trademark></link> type mappings is being given in <uri xlink:href="http://docs.oracle.com/javase/1.3/docs/guide/jdbc/getstart/mapping.html">http://docs.oracle.com/javase/1.3/docs/guide/jdbc/getstart/mapping.html</uri>.</para> </callout> </calloutlist> <qandaset defaultlabel="qanda" xml:id="quandaentry_JdbcTypeConversion"> <title>Getter methods and type conversion</title> <qandadiv> <qandaentry> <question> <para>Apart from type mappings the <trademark xlink:href="http://electronics.zibb.com/trademark/jdbc/29545026">JDBC</trademark> access methods like <link xlink:href="http://docs.oracle.com/javase/7/docs/api/java/sql/ResultSet.html#getString(int)">getString()</link> may also be used for type conversion. Modify <xref linkend="listingJdbcRead"/> by:</para> <itemizedlist> <listitem> <para>Read the database attribute <code>id</code> by <link xlink:href="http://docs.oracle.com/javase/7/docs/api/java/sql/ResultSet.html#getString(java.lang.String)">getString(String)</link>.</para> </listitem> <listitem> <para>Read the database attribute nickname by <link xlink:href="http://docs.oracle.com/javase/7/docs/api/java/sql/ResultSet.html#getInt(java.lang.String)">getInt(String)</link>.</para> </listitem> </itemizedlist> <para>What do you observe?</para> </question> <answer> <para>Modifying our iteration loop:</para> <programlisting language="none">// Step 4: Dataset iteration while (data.next()) { System.out.println(data.<emphasis role="bold">getString</emphasis>("id") <co linkends="jdbcReadWrongType-1" xml:id="jdbcReadWrongType-1-co"/> + ", " + data.<emphasis role="bold">getInt</emphasis>("nickname") <co linkends="jdbcReadWrongType-2" xml:id="jdbcReadWrongType-2-co"/> + ", " + data.getString("birthdate")); }</programlisting> <para>We observe:</para> <calloutlist> <callout arearefs="jdbcReadWrongType-1-co" xml:id="jdbcReadWrongType-1"> <para>Calling <link xlink:href="http://docs.oracle.com/javase/7/docs/api/java/sql/ResultSet.html#getString(int)">getString()</link> for a database attribute of type INTEGER does not cause any trouble: The value gets silently converted to a string value.</para> </callout> <callout arearefs="jdbcReadWrongType-2-co" xml:id="jdbcReadWrongType-2"> <para>Calling <link xlink:href="http://docs.oracle.com/javase/7/docs/api/java/sql/ResultSet.html#getInt(java.lang.String)">getInt(String)</link> for the database field of type CHAR yields an (expected) Exception:</para> </callout> </calloutlist> <programlisting language="none">Exception in thread "main" java.sql.SQLException: Invalid value for getInt() - 'Jim' at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:1073) ...</programlisting> <para>We may however provide <quote>compatible</quote> data records:</para> <programlisting language="none">DELETE FROM Friends; INSERT INTO Friends VALUES (1, <emphasis role="bold">'31'</emphasis>, '1991-10-10');</programlisting> <para>This time our application executes perfectly well:</para> <programlisting language="none">1, 31, 1991-10-10</programlisting> <para>Conclusion: The <trademark xlink:href="http://electronics.zibb.com/trademark/jdbc/29545026">JDBC</trademark> driver performs a conversion from a string type to an integer similar like the <link xlink:href="http://docs.oracle.com/javase/7/docs/api/java/lang/Integer.html#parseInt(java.lang.String)">parseInt(String)</link> method.</para> <para>The next series of exercises aims on a more powerful implementation of our person data insertion application in <xref linkend="exerciseInsertLoginCredentials"/>.</para> </answer> </qandaentry> </qandadiv> </qandaset> <qandaset defaultlabel="qanda" xml:id="quandaentry_HandlingNull"> <title>Handling NULL values.</title> <qandadiv> <qandaentry> <question> <para>The attribute <code>birthday</code> in our database table Friends allows <code>NULL</code> values:</para> <programlisting language="none">INSERT INTO Friends VALUES (1, 'Jim', '1991-10-10') ,(2, <emphasis role="bold"> NULL</emphasis>, '2003-5-24') ,(3, 'Mick', '2001-12-30');</programlisting> <para>Starting our current application yields:</para> <programlisting language="none">1, Jim, 1991-10-10 2, null, 2003-05-24 3, Mick, 2001-12-30</programlisting> <para>This might be confuses with a person having the nickname <quote>null</quote>. Instead we would like to have:</para> <programlisting language="none">1, Jim, 1991-10-10 2, -Name unknown- , 2003-05-24 3, Mick, 2001-12-30</programlisting> <para>Extend the current code of <classname>sda.jdbc.intro.SimpleRead</classname> to produce the above result in case of nickname <code>NULL</code> values.</para> <para>Hint: Read the documentation of <link xlink:href="http://docs.oracle.com/javase/7/docs/api/java/sql/ResultSet.html#wasNull()">wasNull()</link>.</para> </question> <answer> <para>A possible implementation is being given in <classname>sda.jdbc.intro.v1.SimpleRead</classname>.</para> </answer> </qandaentry> </qandadiv> </qandaset> <qandaset defaultlabel="qanda" xml:id="exerciseInsecureAuth"> <title>A user authentication <quote>strategy</quote></title> <qandadiv> <qandaentry> <question> <para>Our current application for entering <code>Person</code> records lacks authentication: A user simply connects to the database using credentials being hard coded in a properties file. A programmer suggests to implement authentication based on the following extension of the <code>Person</code> table:</para> <programlisting language="none">CREATE TABLE Person ( name char(80) NOT NULL ,email CHAR(20) NOT NULL UNIQUE ,login CHAR(10) UNIQUE -- login names must be unique -- ,password CHAR(20) );</programlisting> <para>On clicking <quote>Connect</quote> a user may enter his login name and password, <quote>fred</quote> and <quote>12345678</quote> in the following example:</para> <figure xml:id="figLogin"> <title>Login credentials for database connection</title> <mediaobject> <imageobject> <imagedata fileref="Ref/Screen/login.screen.png" scale="90"/> </imageobject> </mediaobject> </figure> <para>Based on these input values the following SQL query is being executed by a <classname>java.sql.Statement</classname> object:</para> <programlisting language="none">SELECT * FROM Person WHERE login='<emphasis role="bold">fred</emphasis>' and password = '<emphasis role="bold">12345678</emphasis>'</programlisting> <para>Since the login attribute is UNIQUE we are sure to receive either 0 or 1 dataset. Our programmer proposes to grant login if the query returns at least one dataset.</para> <para>Discuss this implementation sketch with a colleague. Do you think this is a sensible approach? <emphasis role="bold">Write down</emphasis> your results.</para> </question> <answer> <para>The approach is essentially unusable due to severe security implications. Since it is based on <classname>java.sql.Statement</classname> rater than on <classname>java.sql.PreparedStatement</classname> objects it is vulnerable to SQL injection attacks. A user my enter the following password value in the GUI:</para> <programlisting language="none">sd' OR '1' = '1</programlisting> <para>Based on the login name <quote>fred</quote> the following SQL string is being crafted:</para> <programlisting language="none">SELECT * FROM Person WHERE login='fred' and password = 'sd' OR <emphasis role="bold">'1' = '1'</emphasis>;</programlisting> <para>Since the WHERE clause's last component always evaluates to true, all objects from the <code>Person</code> relation are returned thus permitting login.</para> <para>The implementation approach suffers from a second deficiency: The passwords are stored in clear text. If an attacker gains access to the <code>Person</code> table he'll immediately retrieve the passwords of all users. This problem can be solved by storing hash values of passwords rather than the clear text values themselves.</para> </answer> </qandaentry> </qandadiv> </qandaset> <qandaset defaultlabel="qanda" xml:id="exerciseHashTraining"> <title>Passwords and hash values</title> <qandadiv> <qandaentry> <question> <para>In exercise <xref linkend="exerciseInsecureAuth"/> we discarded the idea of clear text passwords in favour of password hashes. In order to avoid Rainbow cracking so called salted hashes are superior. You should read <uri xlink:href="https://www.heckrothindustries.co.uk/articles/an-introduction-to-password-hashes">https://www.heckrothindustries.co.uk/articles/an-introduction-to-password-hashes</uri> for overview purposes. The article contains further references on the bottom of the page.</para> <para>With respect to an implementation <uri xlink:href="http://stackoverflow.com/questions/2860943/suggestions-for-library-to-hash-passwords-in-java#11038230">http://stackoverflow.com/questions/2860943/suggestions-for-library-to-hash-passwords-in-java</uri> provides a simple example for:</para> <itemizedlist> <listitem> <para>Creating a salted hash from a given password string.</para> </listitem> <listitem> <para>Verify if a hash string matches a given clear text password.</para> </listitem> </itemizedlist> <para>The example uses an external library. On <productname xlink:href="http://www.ubuntu.com">Ubuntu</productname> Linux this may be installed by issuing <command>aptitude</command> <option>install</option> <option>libcommons-codec-java</option>. On successful install the file <filename>/usr/share/java/commons-codec-1.5.jar</filename> may be appended to your <envar>CLASSPATH</envar>.</para> <para>You may as well use <uri xlink:href="http://crackstation.net/hashing-security.htm#javasourcecode">http://crackstation.net/hashing-security.htm#javasourcecode</uri> as a starting point. This example works standalone without needing an external library. Note: Tis example produces different (incompatible) hash values.</para> <para>Create a simple main() method to experiment with the two class methods.</para> </question> <answer> <para>Starting from <uri xlink:href="http://stackoverflow.com/questions/2860943/suggestions-for-library-to-hash-passwords-in-java#11038230">http://stackoverflow.com/questions/2860943/suggestions-for-library-to-hash-passwords-in-java</uri> we create a slightly modified class <classname>sda.jdbc.intro.auth.HashProvider</classname> offering both hash providing <coref linkend="hashProviderMethod"/> and verifying <coref linkend="hashVerifyMethod"/> methods:</para> <programlisting language="none">package sda.jdbc.intro.auth; ... public class HashProvider { ... /** Computes a salted PBKDF2 hash of given plaintext password suitable for storing in a database. */ public static <emphasis role="bold">String getSaltedHash</emphasis> <co xml:id="hashProviderMethod"/>(char [] password) { byte[] salt; try { salt = SecureRandom.getInstance("SHA1PRNG").generateSeed(saltLen); // store the salt with the password return Base64.encodeBase64String(salt) + "$" + hash(password, salt); } catch (NoSuchAlgorithmException e) { e.printStackTrace(); } System.exit(1); return null; } /** Checks whether given plaintext password corresponds to a stored salted hash of the password. */ public static <emphasis role="bold">boolean check</emphasis> <co xml:id="hashVerifyMethod"/>(char[] password, String stored){ String[] saltAndPass = stored.split("\\$"); if (saltAndPass.length != 2) return false; String hashOfInput = hash(password, Base64.decodeBase64(saltAndPass[0])); return hashOfInput.equals(saltAndPass[1]); } ...}</programlisting> <para>We may test the two class methods <methodname>sda.jdbc.intro.auth.HashProvider.getSaltedHash(char[])</methodname>(...) and <methodname>sda.jdbc.intro.auth.HashProvider.check(char[],String)</methodname> by a separate driver class. Notice the <quote>$</quote> sign <coref linkend="saltPwhashSeparator"/> separating salt and password hash:</para> <programlisting language="none">package sda.jdbc.intro.auth; public class TestHashProvider { public static void main(String [] args) throws Exception { final char [] clearText = {'s', 'e', 'c'}; final String hash = <emphasis role="bold">HashProvider.getSaltedHash(clearText)</emphasis>; System.out.println("Hash:" + hash); if (HashProvider.check(clearText, <co xml:id="saltPwhashSeparator"/> "<emphasis role="bold">HwX2DkuYiwp7xogm3AGndza8DKRVvCMntxRvCrCGFPw=</emphasis>$<emphasis role="bold">6Ix11yHNB4uPZuF2IQYxVV/MYragJwTDE33OIFR9a24=</emphasis>")) { System.out.println("hash matches"); } else { System.out.println("hash does not match"); ...</programlisting> </answer> </qandaentry> </qandadiv> </qandaset> <qandaset defaultlabel="qanda" xml:id="exerciseInsertLoginCredentials"> <title>Gui authentication: The real McCoy</title> <qandadiv> <qandaentry> <question> <para>We now implement a refined version to enter <code>Person</code> records based on the solutions of two related exercises:</para> <glosslist> <glossentry> <glossterm><xref linkend="exercisefilterUserInput"/></glossterm> <glossdef> <para>Avoiding SQL injection by sanitizing user input</para> </glossdef> </glossentry> <glossentry> <glossterm><xref linkend="exerciseSqlInjectPrepare"/></glossterm> <glossdef> <para>Avoiding SQL injection by using <classname>java.sql.PreparedStatement</classname> objects.</para> </glossdef> </glossentry> </glosslist> <para>A better solution should combine both techniques. Non-vulnerability a basic requirement. Checking an E-Mail for minimal conformance is an added value.</para> <para>In order to address authentication the relation Person has to be extended appropriately. The GUI needs two additional fields for login name and password as well. The following video demonstrates the intended behaviour:</para> <figure xml:id="videoConnectAuth"> <title>Intended usage behaviour for insertion of data records.</title> <mediaobject> <videoobject> <videodata fileref="Ref/Video/connectauth.mp4"/> </videoobject> </mediaobject> </figure> <para>Don't forget to use password hashes like those from <xref linkend="exerciseHashTraining"/>. Due to their length you may want to consider the data type <code>TEXT</code>.</para> </question> <answer> <para>In comparison to earlier versions it does make sense to add some internal container structures. First we note, that each GUI input field requires:</para> <itemizedlist> <listitem> <para>A label like <quote>Enter password</quote>.</para> </listitem> <listitem> <para>A corresponding field object to hold user entered input.</para> </listitem> <listitem> <para>A validator checking for correctness of entered data.</para> </listitem> <listitem> <para>A label or text field for warning messages in case of invalid user input.</para> </listitem> </itemizedlist> <para>First we start by grouping label <coref linkend="uiuLabel"/>, input field's verifier <coref linkend="uiuVerifier"/> and the error message label <coref linkend="uiuErrmsg"/> in <classname>sda.jdbc.intro.auth.UserInputUnit</classname>:</para> <programlisting language="none">package sda.jdbc.intro.auth; ... public class UserInputUnit { final JLabel label; <co xml:id="uiuLabel"/> final InputVerifierNotify verifier; <co xml:id="uiuVerifier"/> final JLabel errorMessage; <co xml:id="uiuErrmsg"/> public UserInputUnit(final String guiText, final InputVerifierNotify verifier) { this.label = new JLabel(guiText); this.verifier = verifier; errorMessage = new JLabel(); } ...</programlisting> <para>The actual GUI text field is being defined <coref linkend="verfierGuiField"/> in class <classname>sda.jdbc.intro.auth.InputVerifierNotify</classname>:</para> <programlisting language="none">package sda.jdbc.intro.auth; ... public abstract class InputVerifierNotify extends InputVerifier { protected final String errorMessage; public final JLabel validationLabel; public final JTextField field; <co xml:id="verfierGuiField"/> public InputVerifierNotify(final JTextField field, final String errorMessage) { ...</programlisting> <para>We need two field verifier classes being derived from <classname>sda.jdbc.intro.auth.InputVerifierNotify</classname>:</para> <glosslist> <glossentry> <glossterm><classname>sda.jdbc.intro.auth.RegexpVerifier</classname></glossterm> <glossdef> <para>This one is well known from earlier versions and is used to validate text input fields by regular expressions.</para> </glossdef> </glossentry> <glossentry> <glossterm><classname>sda.jdbc.intro.auth.InputVerifierNotify</classname></glossterm> <glossdef> <para>This verifier class is responsible for comparing our two password fields to have identical values.</para> </glossdef> </glossentry> </glosslist> <para>All these components get assembled in <classname>sda.jdbc.intro.auth.InsertPerson</classname>. We remark some important points:</para> <programlisting language="none">package sda.jdbc.intro.auth; ... public class InsertPerson extends JFrame { ... // GUI attributes for user input final UserInputUnit name = <co linkends="listingInsertUserAuth-1" xml:id="listingInsertUserAuth-1-co"/> new UserInputUnit( "Name", new RegexpVerifier(new JTextField(15), "^[^;'\"]+$", "No special characters allowed")); // We need a reference to the password field to avoid // casting from JTextField later. private final JPasswordField passwordField = new JPasswordField(10); <co linkends="listingInsertUserAuth-2" xml:id="listingInsertUserAuth-2-co"/> private final UserInputUnit password = new UserInputUnit( "Password", new RegexpVerifier(passwordField, "^.{6,20}$", "length from 6 to 20 characters")); ... private final UserInputUnit passwordRepeat = new UserInputUnit( "repeat pass.", new EqualValueVerifier <co linkends="listingInsertUserAuth-3" xml:id="listingInsertUserAuth-3-co"/> (new JPasswordField(10), passwordField, "Passwords do not match")); private final UserInputUnit [] userInputUnits = <co linkends="listingInsertUserAuth-4" xml:id="listingInsertUserAuth-4-co"/> {name, email, login, password, passwordRepeat}; ... private void userLoginDialog() {...} ... public InsertPerson (){ ... databaseFieldPanel.setLayout(new GridLayout(0, 3)); //Third column for validation label add(databaseFieldPanel); for (UserInputUnit unit: userInputUnits) { <co linkends="listingInsertUserAuth-5" xml:id="listingInsertUserAuth-5-co"/> databaseFieldPanel.add(unit.label); databaseFieldPanel.add(unit.verifier.field); databaseFieldPanel.add(unit.verifier.validationLabel); } insertButton.addActionListener(new ActionListener() { @Override public void actionPerformed(ActionEvent e) { if (inputValuesAllValid()) { if (persistenceHandler.add( <co linkends="listingInsertUserAuth-6" xml:id="listingInsertUserAuth-6-co"/> name.getText(), email.getText(), login.getText(), passwordField.getPassword())) { clearMask(); ...} private void clearMask() { <co linkends="listingInsertUserAuth-7" xml:id="listingInsertUserAuth-7-co"/> for (UserInputUnit unit: userInputUnits) { unit.verifier.field.setText(""); unit.verifier.clear(); } } private boolean inputValuesAllValid() {<co linkends="listingInsertUserAuth-8" xml:id="listingInsertUserAuth-8-co"/> for (UserInputUnit unit: userInputUnits) { if (!unit.verifier.verify(unit.verifier.field)){ return false; } } return true; } }</programlisting> <calloutlist> <callout arearefs="listingInsertUserAuth-1-co" xml:id="listingInsertUserAuth-1"> <para>All GUI related stuff for entering a user's name</para> </callout> <callout arearefs="listingInsertUserAuth-2-co" xml:id="listingInsertUserAuth-2"> <para>Password fields need special treatment: <code>getText()</code> is superseded by <code>getPassword()</code>. In order to avoid casts from <classname>javax.swing.JTextField</classname> to <classname>javax.swing.JPasswordField</classname> we simply keep an extra reference.</para> </callout> <callout arearefs="listingInsertUserAuth-3-co" xml:id="listingInsertUserAuth-3"> <para>In order to check both password fields for identical values we need a different validator <classname>sda.jdbc.intro.auth.EqualValueVerifier</classname> expecting both password fields in its constructor.</para> </callout> <callout arearefs="listingInsertUserAuth-4-co" xml:id="listingInsertUserAuth-4"> <para>All 5 user input elements get grouped by an array. This allows for iterations like in <coref linkend="listingInsertUserAuth-7-co"/> or <coref linkend="listingInsertUserAuth-8-co"/>.</para> </callout> <callout arearefs="listingInsertUserAuth-5-co" xml:id="listingInsertUserAuth-5"> <para>Adding all GUI elements to the base pane in a loop.</para> </callout> <callout arearefs="listingInsertUserAuth-6-co" xml:id="listingInsertUserAuth-6"> <para>Providing user entered values to the persistence provider.</para> </callout> <callout arearefs="listingInsertUserAuth-7-co" xml:id="listingInsertUserAuth-7"> <para>Whenever a dataset has been successfully sent to the database we have to clean our GUI to possibly enter another record.</para> </callout> <callout arearefs="listingInsertUserAuth-8-co" xml:id="listingInsertUserAuth-8"> <para>Thanks to our grouping aggregation of individual input GUI field validation states becomes easy.</para> </callout> </calloutlist> </answer> </qandaentry> </qandadiv> </qandaset> <qandaset defaultlabel="qanda" xml:id="quandaentry_ArchSecurity"> <title>Architectural security considerations</title> <qandadiv> <qandaentry> <question> <para>In <xref linkend="exerciseInsertLoginCredentials"/> we achieved end user credential protection. How about the overall application security? Provide improvement proposals if appropriate. Hint: Consider the way credentials are being supplied.</para> </question> <answer> <para>Connecting the client to our database server solely depends on credentials <coref linkend="databaseUserHdmPassword"/> being stored in a properties file <filename>database.properties</filename>:</para> <programlisting language="none">PersistenceHandler.jdbcUrl=jdbc:mysql://localhost:3306/hdm PersistenceHandler.username=hdmuser <co xml:id="databaseUserHdmUsername"/> PersistenceHandler.password=<emphasis role="bold">XYZ</emphasis> <co xml:id="databaseUserHdmPassword"/></programlisting> <para>This properties file is user accessible and contains the password in clear text. Arbitrary applications connecting to the database server using this account do have all permissions being granted to <code>hdmuser</code> <coref linkend="databaseUserHdmUsername"/>. In order for our application to work correctly the set of granted permissions contains at least inserting datasets. Thus new users e.g. <code>smith</code> including credentials may be inserted. Afterwards the original application can be started by logging in as <code>smith</code>.</para> <para>Conclusion: The current application architecture is seriously flawed with respect to security.</para> <para>Rather then using a common database account <code>hdmuser</code> we may configure per-user accounts on the database server having individual user credentials. This way user credentials are no longer stored in our <code>Person</code> table but are being managed by the database server's user management and privilege facilities. This completely avoids storing credentials on the client side.</para> </answer> </qandaentry> </qandadiv> </qandaset> <section xml:id="sda1SaxRdbms"> <title>SAX and RDBMS</title> <qandaset defaultlabel="qanda" xml:id="exercise_saxAttrib"> <title>Reading XML attributes</title> <qandadiv> <qandaentry xml:id="saxRdbms"> <question> <label>SAX processing with RDBMS access.</label> <para>Implement the example given in <xref linkend="saxRdbmsAccessPrinciple"/> to produce the output sketched in <xref linkend="saxPriceOut"/>. You may start by implementing <emphasis>and testing</emphasis> the following methods of a RDBMS interfacing class using <trademark xlink:href="http://electronics.zibb.com/trademark/jdbc/29545026">JDBC</trademark>:</para> <programlisting language="none">package sax.rdbms; public class RdbmsAccess { public void connect(final String host, final int port, final String userName, final String password) { // <emphasis role="bold">open connection to a database</emphasis> } public String readPrice(final String articleNumber) { return "0"; // <emphasis role="bold">To be implemented as access to a ResultSet object</emphasis> } public void close() { // <emphasis role="bold">close database connection</emphasis> } }</programlisting> <para>You may find it helpful to write a small testbed for the RDBMS access functionality prior to integrate it into your <acronym xlink:href="http://www.saxproject.org">SAX</acronym> application producing HTML output.</para> </question> <answer> <para>We start by creating a suitable RDBMS Table:</para> <programlisting language="none">CREATE SCHEMA AUTHORIZATION midb2 CREATE TABLE Product( orderNo CHAR(10) NOT NULL PRIMARY KEY ,price DECIMAL (9,2) NOT NULL )</programlisting> <para>Next we feed some toy data:</para> <programlisting language="none">INSERT INTO Product VALUES('x-223', 330.20); INSERT INTO Product VALUES('w-124', 110.40);</programlisting> <para>Now we implement our RDBMS access class:</para> <programlisting language="none">package dom.xsl; ... public class DbAccess { public void connect(final String jdbcUrl, final String userName, final String password) { try { conn = DriverManager.getConnection(jdbcUrl, userName, password); priceQuery = conn.prepareStatement(sqlPriceQuery); } catch (SQLException e) { System.err.println("Unable to open connection to database:" + e);} } public String readPrice(final String articleNumber) { String result; try { priceQuery.setString(1, articleNumber); final ResultSet rs = priceQuery.executeQuery(); if (rs.next()) { result = rs.getString("price"); } else { result = "No price available for article '" + articleNumber + "'"; } } catch (SQLException e) { result = "Error reading price for article '" + articleNumber + "':" + e; } return result; } public void close() { try {conn.close();} catch (SQLException e) { System.err.println("Error closing database connection:" + e); } } static { try { Class.forName("com.ibm.db2.jcc.DB2Driver"); } catch (ClassNotFoundException e) { System.err.println("Unable to register Driver:" + e);} } private static final String sqlPriceQuery = "SELECT price FROM Product WHERE orderNo = ?"; private PreparedStatement priceQuery = null; private Connection conn = null; }</programlisting> <para>This access layer may be tested independently from handling catalog instances:</para> <programlisting language="none">package dom/xsl; public class DbAccessDriver { public static void main(String[] args) { final DbAccess dbaccess = new DbAccess(); dbaccess.connect("jdbc:db2://db2.mi.hdm-stuttgart.de:10000/hdm", "midb2", "password"); System.out.println(dbaccess.readPrice("x-223")); System.out.println(dbaccess.readPrice("..aaargh!")); dbaccess.close(); } }</programlisting> <para>If the above test succeeds we may embed the RDBMS access layer into our The <acronym xlink:href="http://www.saxproject.org">SAX</acronym> handler:</para> <programlisting language="none">package sax.rdbms; ... public class HtmlEventHandler extends DefaultHandler{ public void startDocument() { dbaccess.connect("jdbc:db2://db2.mi.hdm-stuttgart.de:10000/hdm", "midb2", "password"); System.out.println("<html><head><title>Catalog</title></head>"); } public void endDocument() { System.out.println("</html>"); dbaccess.close(); } public void startElement(String namespaceUri, String localName, String rawName, Attributes attrs){ if (rawName.equals("catalog")){ System.out.println("<body><H1>A catalog</H1>" +"<table border='1'><tbody>"); System.out.println("<tr><th>Order number</th>\n" + "<th>Price</th>\n" +" <th>Product</th></tr>"); } else if (rawName.equals("item")){ final String orderNo = attrs.getValue("orderNo"); System.out.print("<tr><td>" + orderNo + "</td>\n<td>" + dbaccess.readPrice(orderNo) + "</td>\n<td>"); } else { System.err.println("Element '" + rawName + "' unknown"); } } public void endElement(String namespaceUri, String localName, String rawName) { if (rawName.equals("catalog")){ System.out.println("</tbody></table>"); } else if (rawName.equals("item")){ System.out.println("</td></tr>\n"); } } public void characters(char[] ch, int start, int length) { System.out.print(new String(ch, start, length)); } private DbAccess dbaccess = new DbAccess(); }</programlisting> </answer> </qandaentry> </qandadiv> </qandaset> </section> </section> </section> </chapter> <chapter xml:id="chapUnitTesting"> <title>Unit testing with <productname xlink:href="http://testng.org">TestNG</productname></title> <para>This chapter presents a very short introduction to the basic usage of unit testing. We start with a simple stack implementation:</para> <programlisting language="none">package sda.unittesting; public class MyStack { int [] data = new int[5]; int numElements = 0; public void push(final int n) { data[numElements] = n; numElements++; } public int pop() { numElements--; return data[numElements]; } public int top() { return data[numElements - 1]; } public boolean empty() { return 0 == numElements; } }</programlisting> <para>Readers being familiar with stacks will immediately notice a deficiency in the above code: This stack is actually bounded. It only allows us to store a maximum number of five integer values.</para> <para>The following implementation allows us to functionally test our <classname>sda.unittesting.MyStack</classname> implementation with respect to the usual stack behaviour:</para> <programlisting language="none" linenumbering="numbered">package sda.unittesting; public class MyStackFuncTest { private static void assertTrue(boolean status) { if (!status) { throw new RuntimeException("Assert failed"); } } public static void main(String[] args) { final MyStack stack = new MyStack(); // Test 1: A new MyStack instance should not contain any elements. assertTrue(stack.empty()); // Test 2: Adding and removal stack.push(4); assertTrue (!stack.empty()); assertTrue (4 == stack.top()); assertTrue (4 == stack.pop()); assertTrue (stack.empty()); // Test 3: Trying to add more than five values stack.push(1);stack.push(2);stack.push(3);stack.push(4); stack.push(5); stack.push(6); assertTrue(6 == stack.pop()); } }</programlisting> <para>Execution yields a runtime exception which is due to the attempted insert operation <code>stack.push(6)</code>:</para> <programlisting language="none">Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: 5 at sda.unittesting.MyStack.push(MyStack.java:8) at sda.unittesting.MyStackFuncTest.main(MyStackFuncTest.java:20)</programlisting> <para>The execution result is easy to understand since our <classname>sda.unittesting.MyStack </classname> implementation only allows to store 5 values.</para> <para>Our testing application is fine so far. It does however lack some features:</para> <itemizedlist> <listitem> <para>automatic initialization before starting tests and finalization at the end.</para> </listitem> <listitem> <para>Our test is monolithic: We used comments to document different tests. This knowledge is implicit and thus invisible to testing frameworks. Test results (failure/success) cannot be assigned to test 1, test 2 for example.</para> </listitem> <listitem> <para>Aggregation and visualization of test results</para> </listitem> <listitem> <para>Dependencies between individual tests</para> </listitem> <listitem> <para>Ability to enable and disable tests according to a project's maturity level. In our example test 3 might be disabled till an unbounded implementation gets completed.</para> </listitem> </itemizedlist> <para>Testing frameworks like <productname xlink:href="http://junit.org">Junit</productname> or <productname xlink:href="http://testng.org">TestNG</productname> provide means for efficient and flexible test organization. Using <productname xlink:href="http://testng.org">TestNG</productname> our current test application including only test 1 and test 2 reads:</para> <programlisting language="none">package sda.unittesting; import org.testng.annotations.Test; public class MyStackTestSimple { final MyStack stack = new MyStack(); @Test public void empty() { assert(stack.empty()); } @Test public void pushPopEmpty() { assert (stack.empty()); stack.push(4); assert (!stack.empty()); assert (4 == stack.top()); assert (4 == stack.pop()); assert (stack.empty()); } }</programlisting> <para>We notice the absence of a <function>main()</function> method. Our testing framework uses the above code for test definitions. In contrast to our homebrew solution the individual tests are now defined in a machine readable fashion. This allows for sophisticated statistics. Executing inside <productname xlink:href="http://testng.org">TestNG</productname> produces the following results:</para> <programlisting language="none">PASSED: empty PASSED: pushPopEmpty =============================================== Default test Tests run: 2, Failures: 0, Skips: 0 =============================================== =============================================== Default suite Total tests run: 2, Failures: 0, Skips: 0 ===============================================</programlisting> <para>Both tests run successfully. So why did we omit test 3 which is bound to fail? We now add it to the test suite:</para> <programlisting language="none">package sda.unittesting; ... public class MyStackTestSimple1 { ... @Test public void empty() { assert(stack.empty()); ... @Test public void push6() { stack.push(1); stack.push(2); stack.push(3); stack.push(4); stack.push(5); stack.push(6); assert (6 == stack.pop()); } ...</programlisting> <para>As expected test 3 fails. But the result shows test 2 failing as well:</para> <programlisting language="none">PASSED: empty FAILED: push6 java.lang.ArrayIndexOutOfBoundsException: 5 at sda.unittesting.MyStack.push(MyStack.java:8) at sda.unittesting.MyStackTestSimple1.push6(MyStackTestSimple1.java:30) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ... FAILED: pushPopEmpty java.lang.AssertionError at sda.unittesting.MyStackTestSimple1.pushPopEmpty(MyStackTestSimple1.java:15) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ... =============================================== Default test Tests run: 3, Failures: 2, Skips: 0 ===============================================</programlisting> <para>This unexpected result is due to the execution order of the three individual tests. Within our class <classname>sda.unittesting.MyStackTestSimple1</classname> the three tests appear in the sequence test 1, test 2 and test 3. This however is just the order of source code. The testing framework will not infer any order and thus execute our three tests in <emphasis role="bold">arbitrary</emphasis> order. The execution log shows the actual order:</para> <orderedlist> <listitem> <para>Test <quote><code>empty</code></quote></para> </listitem> <listitem> <para>Test <quote><code>push6</code></quote></para> </listitem> <listitem> <para>Test <quote><code>pushPopEmpty</code></quote></para> </listitem> </orderedlist> <para>So the second test will raise an exception and leave the stack filled with the maximum possible five elements. Thus it is not empty and the <quote><code>pushPopEmpty</code></quote> test fails as well.</para> <para>If we want to avoid this type of errors we may:</para> <itemizedlist> <listitem> <para>Declare tests within separate (test class) definitions</para> </listitem> <listitem> <para>Define dependencies like test X can only be executed after test Y.</para> </listitem> </itemizedlist> <para>The <productname xlink:href="http://testng.org">TestNG</productname> framework offers a feature which allows the definition of test groups and dependencies between them. We use this feature to refine our test definition:</para> <programlisting language="none">package sda.unittesting; ... public class MyStackTest { ... @Test (<emphasis role="bold">groups = "basic"</emphasis>) public void empty() { assert(stack.empty()); } @Test (<emphasis role="bold">groups = "basic"</emphasis>) public void pushPopEmpty() { ... } @Test (<emphasis role="bold">dependsOnGroups = "basic"</emphasis>) public void push6() { ... }</programlisting> <para>The first two tests will now belong to the same test group <quote>basic</quote>. The <emphasis role="bold"><code>dependsOnGroups = "basic"</code></emphasis> declaration will guarantee that our <code>push6</code> test will be launched as the last one. So we get the expected result:</para> <programlisting language="none">PASSED: empty PASSED: pushPopEmpty FAILED: push6 java.lang.ArrayIndexOutOfBoundsException: 5 at sda.unittesting.MyStack.push(MyStack.java:8) at sda.unittesting.MyStackTest.push6(MyStackTest.java:30) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ... =============================================== Default test Tests run: 3, Failures: 1, Skips: 0 ===============================================</programlisting> <para>In fact the order between the first two tests might be critical as well. The <quote><code>pushPopEmpty</code></quote> test leaves our stack in an empty state. If this is not the case reversing the execution order of <quote><code>pushPopEmpty</code></quote> and <quote><code>empty</code></quote> would cause an error as well.</para> <para>Programming <abbrev xlink:href="http://en.wikipedia.org/wiki/Integrated_development_environment">IDE</abbrev>s like eclipse provide elements for test result visualization. Our last test gets summarized as:</para> <screenshot> <info> <title><productname xlink:href="http://testng.org">TestNG</productname> result presentation in eclipse</title> </info> <mediaobject> <imageobject> <imagedata fileref="Ref/Screen/eclipseTestngResult.screen.png" scale="75"/> </imageobject> </mediaobject> </screenshot> <para>We can drill down from a result of type failure to its occurrence within the corresponding code.</para> </chapter> <chapter xml:id="fo"> <title>Generating printed output</title> <titleabbrev>Print</titleabbrev> <section xml:id="foIntro"> <title>Online and print versions</title> <titleabbrev>online / print</titleabbrev> <para>We already learned how to transform XML documents into HTML by means of a <abbrev xlink:href="http://www.w3.org/Style/XSL">XSL</abbrev> style sheet processor. In principle we may create printed output by using a HTML Browser's print function. However the result will not meet reasonable typographical standards. A list of commonly required features for printed output includes:</para> <variablelist> <varlistentry> <term>Line breaks</term> <listitem> <para>Text paragraphs have to be divided into lines. To achieve best results the processor must implement the hyphenation rules of the language in question in order to automatically hyphenate long words. This is especially important for text columns of limited width as appearing in newspapers.</para> </listitem> </varlistentry> <varlistentry> <term>Page breaks</term> <listitem> <para>Since printed pages are limited in height the content has to be broken into pages. This may be difficult to achieve:</para> <itemizedlist> <listitem> <para>Large images being indivisible may have to be deferred to the following page leaving large amounts of empty space.</para> </listitem> <listitem> <para>Long tables may have to be subdivided into smaller blocks. Thus it may be required to define sets of additional footers like <quote>to be continued on the next page</quote> and additional table headers containing column descriptions on subsequent pages.</para> </listitem> </itemizedlist> </listitem> </varlistentry> <varlistentry> <term>Page references</term> <listitem> <para>Document internal references via <link xlink:href="http://www.w3.org/TR/xml#id">ID</link> / <link xlink:href="http://www.w3.org/TR/xml#idref">IDREF</link> pairs may be represented as page references like <quote>see page 32</quote>.</para> </listitem> </varlistentry> <varlistentry> <term>Left and right pages</term> <listitem> <para>Books usually have a different layout for <quote>left</quote> and <quote>right</quote> pages. Page numbers usually appear on the left side of a <quote>left</quote> page and vice versa.</para> <para>Very often the head of each page contains additional information e.g. a chapter's name on each <quote>left</quote> page head and the actual section's name on each <quote>right</quote> page's head.</para> <para>In addition chapters usually start on a <quote>right</quote> page. Sometimes a chapter's starting page has special layout features e.g. a missing description in the page's head which will only be given on subsequent pages.</para> </listitem> </varlistentry> <varlistentry> <term>Footnotes</term> <listitem> <para>Footnotes have to be numbered on a per page basis and have to appear on the current page.</para> </listitem> </varlistentry> </variablelist> </section> <section xml:id="foStart"> <title>A simple <abbrev xlink:href="http://www.w3.org/TR/2001/REC-xsl-20011015/slice6.html#fo-section">FO</abbrev> document</title> <titleabbrev>Simple <abbrev xlink:href="http://www.w3.org/TR/2001/REC-xsl-20011015/slice6.html#fo-section">FO</abbrev></titleabbrev> <para>A renderer for printed output from XML content also needs instructions how to format the different elements. A common way to define these formatting properties is by using <emphasis>Formatting Objects</emphasis> (<abbrev xlink:href="http://www.w3.org/TR/2001/REC-xsl-20011015/slice6.html#fo-section">FO</abbrev>) standard. <abbrev xlink:href="http://www.w3.org/TR/2001/REC-xsl-20011015/slice6.html#fo-section">FO</abbrev> documents may be compared to HTML. A HTML document has to be rendered by a piece of software called a browser in order to be viewed as an image. Likewise <abbrev xlink:href="http://www.w3.org/TR/2001/REC-xsl-20011015/slice6.html#fo-section">FO</abbrev> documents have to be rendered by a piece of software called a formatting objects processor which typically yields PostScript or PDF output. As a starting point we take a simple example:</para> <figure xml:id="foHelloWorld"> <title>The most simple <abbrev xlink:href="http://www.w3.org/TR/2001/REC-xsl-20011015/slice6.html#fo-section">FO</abbrev> document</title> <programlisting language="none"><?xml version="1.0" encoding="utf-8"?> <fo:root xmlns:fo="http://www.w3.org/1999/XSL/Format"> <fo:layout-master-set> <!-- Define a simple page layout --> <fo:simple-page-master master-name="simplePageLayout" page-width="60mm" page-height="100mm"> <fo:region-body/> </fo:simple-page-master> </fo:layout-master-set> <!-- Print a set of pages using the previously defined layout --> <fo:page-sequence master-reference="simplePageLayout"> <fo:flow flow-name="xsl-region-body"> <emphasis role="bold"><fo:block>Hello, World ...</fo:block></emphasis> </fo:flow> </fo:page-sequence> </fo:root></programlisting> </figure> <para>PDF generation is initiated by executing a <abbrev xlink:href="http://www.w3.org/TR/2001/REC-xsl-20011015/slice6.html#fo-section">FO</abbrev> processor. At the MI department the script <code>fo2pdf</code> invokes <orgname>RenderX</orgname>'s <productname xlink:href="http://www.renderx.com">xep</productname> processor:</para> <programlisting language="none">fo2pdf -fo hello.fo -pdf hello.pdf</programlisting> <para>This creates a PDF file which may be printed or previewed by e.g. <productname xlink:href="http://www.adobe.com">Adobe</productname>'s acrobat reader or evince under Linux. For a list of command line options see <productname xlink:href="http://www.renderx.com/reference.html">xep's documentation</productname>.</para> </section> <section xml:id="layoutParam"> <title>Page layout</title> <para>The result from of our <quote>Hello, World ...</quote> code is not very impressive. In order to develop more elaborated examples we have to understand the underlying layout model being defined in a <link xlink:href="http://www.w3.org/TR/xsl/#fo_simple-page-master">fo:simple-page-master</link> element. First of all <abbrev xlink:href="http://www.w3.org/TR/2001/REC-xsl-20011015/slice6.html#fo-section">FO</abbrev> allows to subdivide a physical page into different regions:</para> <figure xml:id="foRegionList"> <title>Regions being defined in a page.</title> <mediaobject> <imageobject> <imagedata fileref="Ref/Fig/regions.fig"/> </imageobject> </mediaobject> </figure> <para>The most important area in this model is denoted by <link xlink:href="http://www.w3.org/TR/xsl/#fo_region-body">fo:region-body</link>. Other regions like <link xlink:href="http://www.w3.org/TR/xsl/#fo_region-before">fo:region-before</link> are typically used as containers for meta information such as chapter headings and page numbering. We take a closer look to the <link xlink:href="http://www.w3.org/TR/xsl/#fo_region-body">fo:region-body</link> area and supply an example of parameterization:</para> <figure xml:id="foParamRegBody"> <title>A complete <abbrev xlink:href="http://www.w3.org/TR/2001/REC-xsl-20011015/slice6.html#fo-section">FO</abbrev> parameterizing of a physical page and the <link xlink:href="http://www.w3.org/TR/xsl/#fo_region-body">fo:region-body</link>.</title> <programlisting language="none"><?xml version="1.0" encoding="utf-8"?> <fo:root xmlns:fo="http://www.w3.org/1999/XSL/Format" font-size="6pt"> <fo:layout-master-set> <co xml:id="programlisting_fobodyreg_masterset"/> <fo:simple-page-master master-name="<emphasis role="bold">simplePageLayout</emphasis>" <co xml:id="programlisting_fobodyreg_simplepagelayout"/> page-width = "50mm" page-height = "80mm" margin-top = "5mm" margin-bottom = "20mm" margin-left = "5mm" margin-right = "10mm"> <fo:region-body <co xml:id="programlisting_fobodyreg_regionbody"/> margin-top = "10mm" margin-bottom = "5mm" margin-left = "10mm" margin-right = "5mm"/> </fo:simple-page-master> </fo:layout-master-set> <fo:page-sequence master-reference="<emphasis role="bold">simplePageLayout</emphasis>"> <co xml:id="programlisting_fobodyreg_pagesequence"/> <fo:flow flow-name="xsl-region-body"> <co xml:id="programlisting_fobodyreg_flow"/> <fo:block space-after="2mm">Dumb text .. dumb text.</fo:block> <co xml:id="programlisting_fobodyreg_block"/> <fo:block space-after="2mm">Dumb text .. dumb text.</fo:block> <coref linkend="programlisting_fobodyreg_block"/> <fo:block space-after="2mm">Dumb text .. dumb text.</fo:block> <coref linkend="programlisting_fobodyreg_block"/> <fo:block space-after="2mm">Dumb text .. dumb text.</fo:block> <coref linkend="programlisting_fobodyreg_block"/> </fo:flow> </fo:page-sequence> </fo:root></programlisting> </figure> <calloutlist> <callout arearefs="programlisting_fobodyreg_masterset"> <para>As the name suggests multiple layout definitions can appear here. In this example only one layout is defined.</para> </callout> <callout arearefs="programlisting_fobodyreg_simplepagelayout"> <para>Each layout definition carries a key attribute master-name being unique with respect to all defined layouts appearing in <emphasis>the</emphasis> <tag class="starttag">fo:layout-master-set</tag>. We may thus call it a <emphasis>primary key</emphasis> attribute. The current layout definition's key has the value <code>simplePageLayout</code>. The length specifications appearing here are visualized in <xref linkend="paramRegBodyVisul"/> and correspond to the white rectangle.</para> </callout> <callout arearefs="programlisting_fobodyreg_regionbody"> <para>Each layout definition <emphasis>must</emphasis> have a region body being the region in which the documents main text flow will appear. A layout definition <emphasis>may</emphasis> also define top, bottom and side regions as we will see <link linkend="paramHeadFoot">later</link>. The body region is shown with pink background in <xref linkend="paramRegBodyVisul"/>.</para> </callout> <callout arearefs="programlisting_fobodyreg_pagesequence"> <para>A <abbrev xlink:href="http://www.w3.org/TR/2001/REC-xsl-20011015/slice6.html#fo-section">FO</abbrev> document may have multiple page sequences for example one per each chapter of a book. It <emphasis>must</emphasis> reference an <emphasis>existing</emphasis> layout definition via its <code>master-reference</code> attribute. So we may regard this attribute as a foreign key targeting the set of all defined layout definitions.</para> </callout> <callout arearefs="programlisting_fobodyreg_flow"> <para>A flow allows us to define in which region output shall appear. In the current example only one layout containing one region of type body definition being able to receive text output exists.</para> </callout> <callout arearefs="programlisting_fobodyreg_block"> <para>A <tag class="starttag">fo:block</tag> element may be compared to a paragraph element <tag class="starttag">p</tag> in HTML. The attribute <link xlink:href="http://www.w3.org/TR/xsl/#space-after">space-after</link>="2mm" adds a space of two mm after each <link xlink:href="http://www.w3.org/TR/xsl/#fo_block">fo:block</link> container.</para> </callout> </calloutlist> <para>The result looks like:</para> <figure xml:id="paramRegBodyVisul"> <title>Parameterizing page- and region view port. All length dimensions are in mm.</title> <mediaobject> <imageobject> <imagedata fileref="Ref/Fig/overlay.fig"/> </imageobject> </mediaobject> </figure> </section> <section xml:id="headFoot"> <title>Headers and footers</title> <titleabbrev>Header/footer</titleabbrev> <para>Referring to <xref linkend="foRegionList"/> we now want to add fixed headers and footers frequently being used for page numbers. In a textbook each page might have the actual chapter's name in its header. This name should not change as long as the text below <link xlink:href="http://www.w3.org/TR/xsl/#fo_region-body">fo:region-body</link> still belongs to the same chapter. In <abbrev xlink:href="http://www.w3.org/TR/2001/REC-xsl-20011015/slice6.html#fo-section">FO</abbrev> this is achieved by:</para> <itemizedlist> <listitem> <para>Encapsulating each chapter's content in a <link xlink:href="http://www.w3.org/TR/xsl/#fo_page-sequence">fo:page-sequence</link> of its own.</para> </listitem> <listitem> <para>Defining the desired header text below <link xlink:href="http://www.w3.org/TR/xsl/#fo_static-content">fo:static-content</link> in the area defined by <link xlink:href="http://www.w3.org/TR/xsl/#fo_region-before">fo:region-before</link>.</para> </listitem> </itemizedlist> <para>The notion <link xlink:href="http://www.w3.org/TR/xsl/#fo_static-content">fo:static-content</link> refers to the fact that the content is constant (static) within the given page sequence. The new version reads:</para> <figure xml:id="paramHeadFoot"> <title>Parameterizing header and footer.</title> <programlisting language="none"><?xml version="1.0" encoding="utf-8"?> <fo:root xmlns:fo="http://www.w3.org/1999/XSL/Format" font-size="6pt"> <fo:layout-master-set> <fo:simple-page-master master-name="simplePageLayout" page-width = "50mm" page-height = "80mm" margin-top = "5mm" margin-bottom = "20mm" margin-left = "5mm" margin-right = "10mm"> <fo:region-body margin-top = "10mm" margin-bottom = "5mm" <co xml:id="programlisting_head_foot_bodydef"/> margin-left = "10mm" margin-right = "5mm"/> <fo:region-before extent="5mm"/> <co xml:id="programlisting_head_foot_beforedef"/> <fo:region-after extent="5mm"/> <co xml:id="programlisting_head_foot_afterdef"/> </fo:simple-page-master> </fo:layout-master-set> <fo:page-sequence master-reference="simplePageLayout"> <fo:static-content flow-name="xsl-region-before"> <co xml:id="programlisting_head_foot_beforeflow"/> <fo:block font-weight="bold" font-size="8pt">Headertext</fo:block> </fo:static-content> <fo:static-content flow-name="xsl-region-after"> <co xml:id="programlisting_head_foot_afterflow"/> <fo:block> <fo:page-number/> </fo:block> </fo:static-content> <fo:flow flow-name="xsl-region-body"> <fo:block space-after="8mm">Dumb text .. dumb text.</fo:block> <fo:block space-after="8mm">Dumb text .. dumb text.</fo:block> <fo:block space-after="8mm">More text .. more text.</fo:block> <fo:block space-after="8mm">More text .. more text.</fo:block> <fo:block space-after="8mm">More text .. more text.</fo:block> </fo:flow> </fo:page-sequence> </fo:root></programlisting> </figure> <calloutlist> <callout arearefs="programlisting_head_foot_bodydef"> <para>Defining the body region.</para> </callout> <callout arearefs="programlisting_head_foot_beforedef programlisting_head_foot_afterdef"> <para>Defining two regions at the top and bottom of each page. The <code>extent</code> attribute denotes the height of these regions. <emphasis>Caveat</emphasis>: The attribute <code>extent</code>'s value gets subtracted from the <code>margin-top</code> or <code>margin-bottom</code> value being defined in the corresponding <tag class="starttag">fo:region-body</tag> element. So if we consider for example the <tag>fo:region-before</tag> we have to obey:</para> <para>extent <= margin-top</para> <para>Otherwise we may not even see any output.</para> </callout> <callout arearefs="programlisting_head_foot_beforeflow"> <para>A <code>fo:static-content</code> denotes text portions which are decoupled from the <quote>usual</quote> text flow. For example as a book's chapter advances over multiple pages we expect the constant chapter's title to appear on top of each page. In the current example the static string <code>Headertext</code> will appear on each page's top for the whole <tag class="starttag">fo:page-sequence</tag> in which it is defined. Notice the <code>flow-name="xsl-region-after"</code> reference to the region being defined in <coref linkend="programlisting_head_foot_beforedef"/>.</para> </callout> <callout arearefs="programlisting_head_foot_afterflow"> <para>We do the same here for the page's footer. Instead of static text we output <tag>fo_page-number</tag> yielding the current page's number.</para> <para>This time <code>flow-name="xsl-region-after"</code> references the region definition in <coref linkend="programlisting_head_foot_afterdef"/>. Actually the attribute <code>flow-name</code> is restricted to the following five values corresponding to all possible region definitions within a layout:</para> <informaltable> <?dbhtml table-width="50%" ?> <?dbfo table-width="50%" ?> <tgroup cols="2"> <colspec align="left" colwidth="1*"/> <colspec align="left" colwidth="1*"/> <tbody> <row> <entry><tag class="starttag">fo:region-body</tag></entry> <entry>xsl-region-body</entry> </row> <row> <entry><tag class="starttag">fo:region-before</tag></entry> <entry>xsl-region-before</entry> </row> <row> <entry><tag class="starttag">fo:region-after</tag></entry> <entry>xsl-region-after</entry> </row> <row> <entry><tag class="starttag">fo:region-start</tag></entry> <entry>xsl-region-start</entry> </row> <row> <entry><tag class="starttag">fo:region-end</tag></entry> <entry>xsl-region-end</entry> </row> </tbody> </tgroup> </informaltable> </callout> </calloutlist> <para>This results in two pages with page numbers 1 and 2:</para> <mediaobject> <imageobject> <imagedata fileref="Ref/Fig/headfoot.fig"/> </imageobject> </mediaobject> <para>The free chapter from <xref linkend="bib_Harold04"/> book contains additional information on extended <link xlink:href="http://www.cafeconleche.org/books/bible2/chapters/ch18.html#d1e2250">layout definitions</link>. The <orgname xlink:href="http://w3.org">W3C</orgname> as the holder of the FO standard defines the elements <link xlink:href="http://www.w3.org/TR/xsl/#fo_layout-master-set">fo:layout-master-set</link>, <link xlink:href="http://www.w3.org/TR/xsl/#fo_simple-page-master">fo:simple-page-master</link> and <link xlink:href="http://www.w3.org/TR/xsl/#fo_page-sequence">fo:page-sequence</link></para> </section> <section xml:id="foContainer"> <title>Important Objects</title> <section xml:id="fo_block"> <title><code>fo:block</code></title> <para>The FO standard borrows a lot from the CSS standard. Most formatting objects may have <link xlink:href="http://www.w3.org/TR/xsl/#section-N19349-Description-of-Property-Groups">CSS like properties</link> with similar semantics, some properties have been added. We take a <link xlink:href="http://www.w3.org/TR/xsl/#fo_block">fo:block</link> container as an example:</para> <figure xml:id="blockInline"> <title>A <link xlink:href="http://www.w3.org/TR/xsl/#fo_block">fo:block</link> with a <link xlink:href="http://www.w3.org/TR/xsl/#fo_inline">fo:inline</link> descendant.</title> <mediaobject> <imageobject> <imagedata fileref="Ref/Fig/blockprop.fo.pdf"/> </imageobject> </mediaobject> <programlisting language="none">... <fo:block font-weight='bold' border-bottom-style='dashed' border-style='solid' border='1mm'>A lot of attributes and <fo:inline background-color='black' color='white'>inverted</fo:inline> text.</fo:block> ...</programlisting> </figure> <para>The <link xlink:href="http://www.w3.org/TR/xsl/#fo_inline">fo:inline</link> descendant serves as a means to change the <quote>current</quote> property set. In HTML/CSS this may be achieved by using the <code>SPAN</code> tag:</para> <programlisting language="none"><!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"> <html> <head> <title>Blocks/spans and CSS</title> </head> <body> <h1>Blocks/spans and CSS</h1> <p style="font-weight: bold; border: 1mm; border-style: solid; border-bottom-style: dashed;" >A lot of attributes and <span style="color: white;background-color: black;" >inverted</span> text.</p> </body> </html></programlisting> <para>Though being encapsulated in an attribute <code>class</code> we find a one-to-one correspondence between FO and CSS in this case. The HTML rendering works as expected.<mediaobject> <imageobject> <imagedata fileref="Ref/Screen/mozparaspancss.screen.png"/> </imageobject> </mediaobject>:</para> </section> <section xml:id="fo_list"> <title>Lists</title> <para>The easiest type of lists are unlabeled (itemized) lists as being expressed by the <code>UL</code>/<code>LI</code> tags in HTML. FO allows a much more detailed parametrization regarding indents and distances between labels and item content. Relevant elements are <link xlink:href="http://www.w3.org/TR/xsl/#fo_list-block">fo:list-block</link>, <link xlink:href="http://www.w3.org/TR/xsl/#fo_list-item">fo:list-item</link> and <link xlink:href="http://www.w3.org/TR/xsl/#fo_list-item-body">fo:list-item-body</link>. The drawback is a more complex setup for <quote>default</quote> lists:</para> <figure xml:id="listItemize"> <title>An itemized list and result.</title> <programlisting language="none">... <fo:list-block provisional-distance-between-starts="2mm"> <fo:list-item> <fo:list-item-label end-indent="label-end()"> <fo:block>&#8226;</fo:block> </fo:list-item-label> <fo:list-item-body start-indent="body-start()"> <fo:block>Flowers</fo:block> </fo:list-item-body> </fo:list-item> <fo:list-item> <fo:list-item-label end-indent="label-end()"> <fo:block>&#8226;</fo:block> </fo:list-item-label> <fo:list-item-body start-indent="body-start()"> <fo:block>Animals</fo:block> </fo:list-item-body> </fo:list-item> </fo:list-block> ...</programlisting> <mediaobject> <imageobject> <imagedata align="left" fileref="Ref/Fig/itemize.fo.pdf"/> </imageobject> </mediaobject> </figure> <para>The result looks somewhat primitive in relation to the amount of source code it necessitates. The power of these constructs shows up when trying to format nested lists of possibly different types like enumerations or definition lists under the requirement of typographical excellence. More complex examples are presented in <link xlink:href="http://www.cafeconleche.org/books/bible2/chapters/ch18.html#d1e4979">Xmlbible book</link> of <xref linkend="bib_Harold04"/>.</para> </section> <section xml:id="leaderRule"> <title>Leaders and rules</title> <titleabbrev>Leaders/rules</titleabbrev> <para>Sometimes adjustable horizontal space between two neighbouring objects has to be filled e.g. in a book's table of contents. The <link xlink:href="http://www.w3.org/TR/xsl/#fo_leader">fo:leader</link> serves this purpose:</para> <figure xml:id="leaderToc"> <title>Two simulated entries in a table of contents.</title> <programlisting language="none">... <fo:block text-align-last='justify'>Valid XML<fo:leader leader-pattern="dots"/> page 7</fo:block> <fo:block text-align-last='justify'>XSL <fo:leader leader-pattern='dots'/> page 42</fo:block> ...</programlisting> <mediaobject> <imageobject> <imagedata align="left" fileref="Ref/Fig/leader.fo.pdf"/> </imageobject> </mediaobject> </figure> <para>The attributes' value <link xlink:href="http://www.w3.org/TR/xsl/#text-align-last">text-align-last</link> = <code>'justify'</code> forces the <link xlink:href="http://www.w3.org/TR/xsl/#fo_block">fo:block</link> to extend to the available width of the current <link xlink:href="http://www.w3.org/TR/xsl/#fo_region-body">fo:region-body</link> area. The <link xlink:href="http://www.w3.org/TR/xsl/#fo_leader">fo:leader</link> inserts the necessary amount of content of the specified type defined in in <link xlink:href="http://www.w3.org/TR/xsl/#leader-pattern">leader-pattern</link> to fill up the gap between its neighbouring components. This principle can be extended to multiple objects:</para> <figure xml:id="leaderMulti"> <title>Four entries separated by equal amounts of dotted space.</title> <programlisting language="none"><fo:block text-align-last='justify'>A<fo:leader leader-pattern="dots"/>B<fo:leader leader-pattern="dots"/>C<fo:leader leader-pattern="dots"/>D</fo:block></programlisting> <mediaobject> <imageobject> <imagedata fileref="Ref/Fig/leadermulti.fo.pdf"/> </imageobject> </mediaobject> </figure> <para>A <link xlink:href="http://www.w3.org/TR/xsl/#fo_leader">fo:leader</link> may also be used to draw horizontal lines to separate objects. In this case there are no neighbouring components within the <quote>current</quote> line in which the <link xlink:href="http://www.w3.org/TR/xsl/#fo_leader">fo:leader</link> appears. This is frequently used to draw a border between <code>xsl-region-body</code> and <code>xsl-region-before</code> and/or <code>xsl-region-after</code>:</para> <figure xml:id="leaderSeparate"> <title>A horizontal line separator between header and body of a page.</title> <programlisting language="none">... <fo:page-sequence master-reference="simplePageLayout"> <fo:static-content flow-name="xsl-region-before"> <fo:block text-align-last='justify'>FO<fo:leader/>page 5</fo:block> <fo:block text-align-last='justify'> <fo:leader leader-pattern="rule" leader-length="100%"/> </fo:block> </fo:static-content> <fo:flow flow-name="xsl-region-body"> <fo:block>Some body text ...</fo:block> </fo:flow> </fo:page-sequence>...</programlisting> <mediaobject> <imageobject> <imagedata align="left" fileref="Ref/Fig/separate.fo.pdf"/> </imageobject> </mediaobject> </figure> <para>Note the empty leader <code><</code> <link xlink:href="http://www.w3.org/TR/xsl/#fo_leader">fo:leader</link> <code>/></code> between the <quote> <code>FO</code> </quote> and the <quote>page 5</quote> text node inserting horizontal whitespace to get the page number centered to the header's right edge. This is in accordance with the <link xlink:href="http://www.w3.org/TR/xsl/#leader-pattern">leader-pattern</link> attributes default value <code>space</code>.</para> </section> <section xml:id="pageNumbering"> <title>Page numbers</title> <para>We already saw an example of page numbering via <link xlink:href="http://www.w3.org/TR/xsl/#fo_page-number">fo:page-number</link> in <xref linkend="paramHeadFoot"/>. Sometimes a different style for page numbering is desired. The default page numbering style may be changed by means of the <link xlink:href="http://www.w3.org/TR/xsl/#fo_page-sequence">fo:page-sequence</link> element's attribute <link xlink:href="http://www.w3.org/TR/xsl/#format">format</link>. For a closer explanation the <link xlink:href="http://www.w3.org/TR/2007/REC-xslt20-20070123/#convert">W3X XSLT standards documentation</link> may be consulted:</para> <figure xml:id="pageNumberingRoman"> <title>Roman style page numbers.</title> <programlisting language="none">... <fo:page-sequence format="i" master-reference="simplePageLayout"> <fo:static-content flow-name="xsl-region-after"> <fo:block text-align-last='justify'> <fo:leader leader-pattern="rule" leader-length="100%"/> </fo:block> <fo:block font-weight="bold"> <fo:page-number/> </fo:block> </fo:static-content> <fo:flow flow-name="xsl-region-body"> <fo:block>Some text...</fo:block> <fo:block>More text, more text, more text.</fo:block> <fo:block>More text, more text, more text.</fo:block> <fo:block>Enough text.</fo:block> </fo:flow> </fo:page-sequence> ...</programlisting> <mediaobject> <imageobject> <imagedata align="left" fileref="Ref/Fig/pageStack.fig"/> </imageobject> </mediaobject> </figure> </section> <section xml:id="foMarker"> <title>Marker</title> <figure xml:id="dictionary"> <title>A dictionary with running page headers.</title> <programlisting language="none">... <fo:page-sequence master-reference="simplePageLayout"> <fo:static-content flow-name="xsl-region-before"> <fo:block font-weight="bold"> <fo:retrieve-marker retrieve-class-name="alpha" retrieve-position="first-starting-within-page" />-<fo:retrieve-marker retrieve-position="last-starting-within-page" retrieve-class-name="alpha"/> </fo:block> <fo:block text-align-last='justify'> <fo:leader leader-pattern="rule" leader-length="100%"/></fo:block> </fo:static-content> <fo:flow flow-name="xsl-region-body"> <fo:block> <fo:marker marker-class-name="alpha">A </fo:marker>Ant</fo:block> <fo:block> <fo:marker marker-class-name="alpha">B </fo:marker>Bug</fo:block> <fo:block> <fo:marker marker-class-name="alpha">L </fo:marker>Lion</fo:block> <fo:block> <fo:marker marker-class-name="alpha">N </fo:marker>Nose</fo:block> <fo:block> <fo:marker marker-class-name="alpha">P </fo:marker>Peg</fo:block> </fo:flow> </fo:page-sequence> ...</programlisting> <mediaobject> <imageobject> <imagedata align="left" fileref="Ref/Fig/dictionaryStack.fig"/> </imageobject> </mediaobject> </figure> </section> <section xml:id="foIntRef"> <title>Internal references</title> <titleabbrev>References</titleabbrev> <para>Regarding printed documents we may define two categories of document internal references:</para> <variablelist> <varlistentry> <term><emphasis>Page number references</emphasis></term> <listitem> <para>This is the <quote>classical</quote> type of a reference e.g. in books. An author refers the reader to a distant location by writing <quote>... see further explanation in section 4.5 on page 234</quote>. A book's table of contents assigning page numbers to topics is another example. This way the implementation of a reference relies solely on the features a printed document offers.</para> </listitem> </varlistentry> <varlistentry> <term><emphasis>Hypertext references</emphasis></term> <listitem> <para>This way of implementing references utilizes features of (online) viewers for printable documents. For example PDF viewers like <productname xlink:href="http://www.adobe.com">Adobe's Acrobat reader</productname> or the evince application are able to follow hypertext links in a fashion known from HTML browsers. This browser feature is based on hypertext capabilities defined in the Adobe's PDF de-facto standard.</para> </listitem> </varlistentry> </variablelist> <para>Of course the second type of references is limited to people who use an online viewer application instead of reading a document from physical paper.</para> <para>We now show the implementation of <abbrev xlink:href="http://www.w3.org/TR/2001/REC-xsl-20011015/slice6.html#fo-section">FO</abbrev> based page references. As already being discussed for <link xlink:href="http://www.w3.org/TR/xml#id">ID</link> / <link xlink:href="http://www.w3.org/TR/xml#idref">IDREF</link> pairs we need a link destination (anchor) and a link source. The <abbrev xlink:href="http://www.w3.org/TR/2001/REC-xsl-20011015/slice6.html#fo-section">FO</abbrev> standard uses the same anchor implementation as in XML for <link xlink:href="http://www.w3.org/TR/xml#id">ID</link> typed attributes: <abbrev xlink:href="http://www.w3.org/TR/2001/REC-xsl-20011015/slice6.html#fo-section">FO</abbrev> objects <emphasis>may</emphasis> have an attribute <link xlink:href="http://www.w3.org/TR/xsl/#id">id</link> with a document wide unique value. The <abbrev xlink:href="http://www.w3.org/TR/2001/REC-xsl-20011015/slice6.html#fo-section">FO</abbrev> element <link xlink:href="http://www.w3.org/TR/xsl/#fo_page-number-citation">fo:page-number-citation</link> is used to actually create a page reference via its attribute <link xlink:href="http://www.w3.org/TR/xsl/#ref-id">ref-id</link>:</para> <figure xml:id="refJavaXml"> <title>Two blocks mutual page referencing each other.</title> <programlisting language="none">... <fo:flow flow-name='xsl-region-body'> <fo:block id='xml'>Java section see page <fo:page-number-citation ref-id='java'/>. </fo:block> <fo:block id='java'>XML section see page <fo:page-number-citation ref-id='xml'/>. </fo:block> </fo:flow> ...</programlisting> <mediaobject> <imageobject> <imagedata align="left" fileref="Ref/Fig/pagerefStack.fig"/> </imageobject> </mediaobject> </figure> <para>NB: Be careful defining <link xlink:href="http://www.w3.org/TR/xsl/#id">id</link> attributes for objects being descendants of <link xlink:href="http://www.w3.org/TR/xsl/#fo_static-content">fo:static-content</link> nodes. Such objects typically appear on multiple pages and are therefore no unique anchors. A reference carrying such an id value thus actually refers to 1 <= n values on n different pages. Typically a user agent will choose the first object of this set when clicking the link. So in effect the parent <link xlink:href="http://www.w3.org/TR/xsl/#fo_page-sequence">fo:page-sequence</link> is chosen as the effective link target.</para> <para>The element <link xlink:href="http://www.w3.org/TR/xsl/#fo_basic-link">fo:basic-link</link> creates PDF hypertext links. We extend the previous example:</para> <figure xml:id="refJavaXmlHyper"> <title>Two blocks with mutual page- and hypertext references.</title> <programlisting language="none"><fo:flow flow-name='xsl-region-body'> <fo:block id='xml'>Java section see <fo:basic-link color="blue" internal-destination="java">page<fo:page-number-citation ref-id='java'/>.</fo:basic-link></fo:block> <fo:block id='java'>XML section see <fo:basic-link color="blue" internal-destination="xml">page <fo:page-number-citation ref-id='xml'/>.</fo:basic-link></fo:block > </fo:flow></programlisting> <mediaobject> <imageobject> <imagedata fileref="Ref/Fig/pagerefhyperStack.fig"/> </imageobject> </mediaobject> </figure> </section> <section xml:id="pdfBookmarks"> <title>PDF bookmarks</title> <titleabbrev>Bookmarks</titleabbrev> <para>The PDF specification allows to define so called bookmarks offering an explorer like navigation:</para> <mediaobject> <imageobject> <imagedata fileref="Ref/Screen/pdfbookmarks.screen.png"/> </imageobject> </mediaobject> <para>PDF bookmarks are <link xlink:href="http://www.w3.org/TR/2006/REC-xsl11-20061205/#d0e14206">part of the XSL-FO 1.1</link> Standard. Some <abbrev xlink:href="http://www.w3.org/TR/2001/REC-xsl-20011015/slice6.html#fo-section">FO</abbrev> processors still continue to use proprietary solutions for bookmark creation with respect to the older <abbrev xlink:href="http://www.w3.org/TR/2001/REC-xsl-20011015/slice6.html#fo-section">FO</abbrev> 1.0 standard. For details of bookmark extensions by <orgname>RenderX</orgname>'s processor see <link xlink:href="http://www.renderx.com/tutorial.html#PDF_Bookmarks">xep's documentation</link>.</para> </section> </section> <section xml:id="xml2fo"> <title>Constructing <abbrev xlink:href="http://www.w3.org/TR/2001/REC-xsl-20011015/slice6.html#fo-section">FO</abbrev> from XML documents</title> <titleabbrev><abbrev xlink:href="http://www.w3.org/TR/2001/REC-xsl-20011015/slice6.html#fo-section">FO</abbrev> from XML</titleabbrev> <para>So far we have learnt some basic <abbrev xlink:href="http://www.w3.org/TR/2001/REC-xsl-20011015/slice6.html#fo-section">FO</abbrev> elements. As with HTML we typically generate FO code from other sources rather than crafting it by hand. The general picture is:</para> <figure xml:id="htmlFoProduction"> <title>Different target formats from common source.</title> <mediaobject> <imageobject> <imagedata fileref="Ref/Fig/crossmedia.fig" scale="65"/> </imageobject> <caption> <para>We may generate both online and printed documentation from a common source. This requires style sheets for the desired destination formats in question.</para> </caption> </mediaobject> </figure> <para>We discussed the <abbrev xlink:href="http://www.w3.org/TR/2001/REC-xsl-20011015/slice6.html#fo-section">FO</abbrev> standard as an input format for printable output production by a renderer. In this way a <abbrev xlink:href="http://www.w3.org/TR/2001/REC-xsl-20011015/slice6.html#fo-section">FO</abbrev> document is similar to HTML being a format to be rendered by a web browser for visual (screen oriented) output production. The transformation from a XML source (e.g. a memo document) to <abbrev xlink:href="http://www.w3.org/TR/2001/REC-xsl-20011015/slice6.html#fo-section">FO</abbrev> is still missing. As for HTML we may use <abbrev xlink:href="http://www.w3.org/Style/XSL">XSL</abbrev> as a transformation means. We generate the sender's surname from a memo document instance:</para> <figure xml:id="memo2fosurname"> <title>Generating a sender's surname for printing.</title> <programlisting language="none"><?xml version="1.0" encoding="utf-8"?> <xsl:stylesheet version="1.0" xmlns:fo="http://www.w3.org/1999/XSL/Format" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"> <xsl:output method="xml" indent="yes"/> <xsl:template match="/"> <fo:root> <fo:layout-master-set> <fo:simple-page-master master-name="simplePageLayout" page-width="294mm" page-height="210mm" margin="5mm"> <fo:region-body margin="15mm"/> </fo:simple-page-master> </fo:layout-master-set> <fo:page-sequence master-reference="simplePageLayout"> <fo:flow flow-name="xsl-region-body"> <fo:block font-size="20pt"> <xsl:text>Sender:</xsl:text> <fo:inline font-weight='bold'> <xsl:value-of select="memo/from/surname"/> </fo:inline> </fo:block> </fo:flow> </fo:page-sequence> </fo:root> </xsl:template> </xsl:stylesheet></programlisting> </figure> <para>A suitable XML document instance reads:</para> <figure xml:id="memoMessage"> <title>A <code>memo</code> document instance.</title> <programlisting language="none"><memo ...="memo.xsd"> <from> <name>Martin</name> <surname>Goik</surname> </from> <to> <name>Adam</name> <surname>Hacker</surname> </to> <to> <name>Eve</name> <surname>Intruder</surname> </to> <date year="2005" month="1" day="6"/> <subject>Firewall problems</subject> <content> <para>Thanks for your excellent work.</para> <para>Our firewall is definitely broken!</para> </content> </memo></programlisting> </figure> <para>Some remarks:</para> <orderedlist> <listitem> <para>The <link xlink:href="http://www.w3.org/TR/2007/REC-xslt20-20070123/#element-stylesheet">xsl_stylesheet</link> element contains a namespace definition for the target FO document's namespace, namely:</para> <programlisting language="none">xmlns:xsl="http://www.w3.org/1999/XSL/Transform"</programlisting> <para>This is required to use elements like <link xlink:href="http://www.w3.org/TR/xsl/#fo_block">fo:block</link> belonging to the FO namespace.</para> </listitem> <listitem> <para>The option value <code>indent="yes"</code> in <link xlink:href="http://www.w3.org/TR/2007/REC-xslt20-20070123/#element-output">xsl_output</link> is usually set to "no" in a production environment to avoid whitespace related problems.</para> </listitem> <listitem> <para>The generation of a print format like PDF is actually a two step process. To generate message.pdf from message.xml by a stylesheet memo2fo.xsl we need the following calls:</para> <variablelist> <varlistentry> <term><emphasis>XML document instance to FO</emphasis></term> <listitem> <programlisting language="none">xml2xml message.xml memo2fo.xsl -o message.fo</programlisting> </listitem> </varlistentry> <varlistentry> <term><emphasis>FO to PDF</emphasis></term> <listitem> <programlisting language="none">fo2pdf -fo message.fo -pdf message.pdf</programlisting> </listitem> </varlistentry> </variablelist> <mediaobject> <imageobject> <imagedata fileref="Ref/Fig/xml2fo2pdf.fig"/> </imageobject> </mediaobject> <para>When debugging of the intermediate <abbrev xlink:href="http://www.w3.org/TR/2001/REC-xsl-20011015/slice6.html#fo-section">FO</abbrev> file is not required both steps may be combined into a single call:</para> <programlisting language="none">fo2pdf -xml message.xml -xsl memo2fo.xsl -pdf message.pdf</programlisting> </listitem> </orderedlist> </section> <section xml:id="foCatalog"> <title>Formatting a catalog.</title> <titleabbrev>A catalog</titleabbrev> <para>We now take the <link linkend="climbingCatalog">climbing catalog example</link> with prices being added and incrementally create a series of PDF versions improving from one version to another.</para> <qandaset defaultlabel="qanda" xml:id="idCatalogStart"> <title>A first PDF version of the catalog</title> <qandadiv> <qandaentry> <question> <para>Write a <abbrev xlink:href="http://www.w3.org/Style/XSL">XSL</abbrev> script to generate a starting version <filename xlink:href="Ref/src/Dom/climbenriched.start.pdf">climbenriched.start.pdf</filename>.</para> </question> <answer> <programlisting language="none"><?xml version="1.0" encoding="utf-8"?> <xsl:stylesheet version="1.0" xmlns:fo="http://www.w3.org/1999/XSL/Format" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"> <xsl:output method="xml" indent="yes"/> <xsl:template match="/"> <fo:root font-size="10pt"> <fo:layout-master-set> <fo:simple-page-master master-name="productPage" page-width="80mm" page-height="110mm" margin="5mm"> <fo:region-body margin="15mm"/> <fo:region-before extent="10mm"/> </fo:simple-page-master> </fo:layout-master-set> <xsl:apply-templates select="catalog/product" /> </fo:root> </xsl:template> <xsl:template match="product"> <fo:page-sequence master-reference="productPage"> <fo:static-content flow-name="xsl-region-before"> <fo:block font-weight="bold"> <xsl:value-of select="title"/> </fo:block> </fo:static-content> <fo:flow flow-name="xsl-region-body"> <xsl:apply-templates select="description/para"/> <fo:block>Price:<xsl:value-of select="@price"/></fo:block> <fo:block>Order no:<xsl:value-of select="@id"/></fo:block> </fo:flow> </fo:page-sequence> </xsl:template> <xsl:template match="para"> <fo:block space-after="10px"> <xsl:value-of select="."/> </fo:block> </xsl:template> </xsl:stylesheet></programlisting> </answer> </qandaentry> <qandaentry xml:id="idCatalogProduct"> <question> <label>Header, page numbers and table formatting</label> <para>Extend <xref linkend="idCatalogStart"/> by adding page numbers. The order number and prices shall be formatted as tables. Add a ruler to each page's head. The result should look like <filename xlink:href="Ref/src/Dom/climbenriched.product.pdf">climbenriched.product.pdf</filename></para> </question> <answer> <para>Solution see <filename xlink:href="Ref/src/Dom/catalog2fo.product.xsl">catalog2fo.product.xsl</filename>.</para> </answer> </qandaentry> <qandaentry xml:id="idCatalogToc"> <question> <label>A table of contents.</label> <para>Each product description's page number shall appear in a table of contents together with the product's <code>title</code> as in <filename xlink:href="Ref/src/Dom/climbenriched.toc.pdf">climbenriched.toc.pdf</filename>.</para> </question> <answer> <para>Solution see <filename xlink:href="Ref/src/Dom/catalog2fo.toc.xsl">catalog2fo.toc.xsl</filename>.</para> </answer> </qandaentry> <qandaentry xml:id="idCatalogToclink"> <question> <label>A table of contents with hypertext links.</label> <para>The table of contents' entries may offer hypertext features to supporting browsers as in <filename xlink:href="Ref/src/Dom/climbenriched.toclink.pdf">climbenriched.toclink.pdf</filename>. In addition include the document's <tag class="starttag">introduction</tag>.</para> </question> <answer> <para>Solution see <filename xlink:href="Ref/src/Dom/catalog2fo.toclink.xsl">catalog2fo.toclink.xsl</filename>.</para> </answer> </qandaentry> <qandaentry xml:id="idCatalogFinal"> <question> <label>A final version.</label> <para>Add the following features:</para> <orderedlist> <listitem> <para>Number the table of contents starting with page i, ii, iii, iv and so on. Start the product descriptions with page 1. On each page's footer a text <quote>page xx of yy</quote> shall be displayed. This requires the definition of an anchor <code>id</code> on the <abbrev xlink:href="http://www.w3.org/TR/2001/REC-xsl-20011015/slice6.html#fo-section">FO</abbrev> document's last page.</para> </listitem> <listitem> <para>Add PDF bookmarks by using <orgname>XEP</orgname>'s <abbrev xlink:href="http://www.w3.org/TR/2001/REC-xsl-20011015/slice6.html#fo-section">FO</abbrev> extensions. This requires the namespace declaration <code>xmlns:rx="http://www.renderx.com/XSL/Extensions"</code> in the XSLT script's header.</para> </listitem> </orderedlist> <para>The result may look like <filename xlink:href="Ref/src/Dom/climbenriched.final.pdf">climbenriched.final.pdf</filename>. N.B.: It may take some effort to achieve this result. This effort is left to the <emphasis>interested</emphasis> participants.</para> </question> <answer> <para>Solution see <filename xlink:href="Ref/src/Dom/catalog2fo.toclink.xsl">catalog2fo.toclink.xsl</filename>.</para> </answer> </qandaentry> </qandadiv> </qandaset> </section> </chapter> <appendix> <title>W3C production rules</title> <productionset> <title><link xlink:href="http://www.w3.org/TR/2008/REC-xml-20081126/#charsets">Characters</link></title> <production xml:id="w3RecXml_NT-Letter"> <lhs>Letter</lhs> <rhs><nonterminal def="#w3RecXml_NT-BaseChar">BaseChar</nonterminal> | <nonterminal def="#w3RecXml_NT-Ideographic">Ideographic</nonterminal></rhs> </production> <production xml:id="w3RecXml_NT-BaseChar"> <lhs>BaseChar</lhs> <rhs>[#x0041-#x005A] | [#x0061-#x007A] | [#x00C0-#x00D6] | [#x00D8-#x00F6] | [#x00F8-#x00FF] | [#x0100-#x0131] | [#x0134-#x013E] |...(values omitted here, see W3C documentation)</rhs> </production> <production xml:id="w3RecXml_NT-Ideographic"> <lhs>Ideographic</lhs> <rhs>[#x4E00-#x9FA5] | #x3007 | [#x3021-#x3029]</rhs> </production> <production xml:id="w3RecXml_NT-CombiningChar"> <lhs>CombiningChar</lhs> <rhs>[#x0300-#x0345] | ...(values omitted here)</rhs> </production> <production xml:id="w3RecXml_NT-Digit"> <lhs>Digit</lhs> <rhs>[#x0030-#x0039] | [#x0660-#x0669] | [#x06F0-#x06F9] | [#x0966-#x096F] | [#x09E6-#x09EF] | [#x0A66-#x0A6F] | [#x0AE6-#x0AEF] | [#x0B66-#x0B6F] | [#x0BE7-#x0BEF] | [#x0C66-#x0C6F] | [#x0CE6-#x0CEF] | [#x0D66-#x0D6F] | [#x0E50-#x0E59] | [#x0ED0-#x0ED9] | [#x0F20-#x0F29]</rhs> </production> <production xml:id="w3RecXml_NT-Extender"> <lhs>Extender</lhs> <rhs>#x00B7 | #x02D0 | #x02D1 | #x0387 | #x0640 | #x0E46 | #x0EC6 | #x3005 | [#x3031-#x3035] | [#x309D-#x309E] | [#x30FC-#x30FE]</rhs> </production> </productionset> </appendix> <appendix> <title>Glossary</title> <para/> <glossary> <glossentry xml:id="gloss_API"> <glossterm><abbrev xlink:href="http://en.wikipedia.org/wiki/Api" xml:id="abbr_api">API</abbrev></glossterm> <glossdef> <para>Application programming interface</para> </glossdef> </glossentry> <glossentry xml:id="gloss_SqlDdl"> <glossterm><abbrev xlink:href="http://en.wikipedia.org/wiki/Data_definition_language" xml:id="abbr_Ddl">DDL</abbrev> <link linkend="gloss_SQL">(SQL)</link></glossterm> <glossdef> <para>Data definition language. The subset of <link linkend="gloss_SQL">SQL</link> dealing with the creation of tables, views etc.</para> </glossdef> </glossentry> <glossentry xml:id="gloss_DOM"> <glossterm><acronym xlink:href="http://www.w3.org/DOM" xml:id="abbr_Dom">DOM</acronym></glossterm> <glossdef> <para>The <link linkend="gloss_W3C">W3C</link> <link xlink:href="http://www.w3.org/DOM">Document Object Model</link> standard</para> </glossdef> </glossentry> <glossentry xml:id="gloss_DTD"> <glossterm><abbrev xlink:href="http://en.wikipedia.org/wiki/Document_Type_Declaration" xml:id="abbr_Dtd">DTD</abbrev></glossterm> <glossdef> <para>Document Type Definition. An older standard with respect to <link linkend="gloss_RelaxNG">RelaxNG</link> and <link linkend="gloss_RelaxNG">XML schema</link> to define an XML documents grammar.</para> </glossdef> </glossentry> <glossentry xml:id="gloss_EBNF"> <glossterm><abbrev>EBNF</abbrev></glossterm> <glossdef> <para>Extended Backus-Naur form.</para> </glossdef> </glossentry> <glossentry xml:id="gloss_ftp"> <glossterm><abbrev xlink:href="http://en.wikipedia.org/wiki/File_Transfer_Protocol" xml:id="abbr_Ftp">ftp</abbrev></glossterm> <glossdef> <para>File Transfer Protocol</para> </glossdef> </glossentry> <glossentry xml:id="gloss_FO"> <glossterm><abbrev xlink:href="http://www.w3.org/TR/2001/REC-xsl-20011015/slice6.html#fo-section" xml:id="abbr_Fo">FO</abbrev></glossterm> <glossdef> <para>The Formatting Objects Standard for printable output generation</para> </glossdef> </glossentry> <glossentry xml:id="gloss_HDM"> <glossterm><orgname xlink:href="http://www.hdm-stuttgart.de" xml:id="org_Hdm">Hdm</orgname></glossterm> <glossdef> <para xml:lang="de">Hochschule der Medien.</para> </glossdef> </glossentry> <glossentry xml:id="gloss_Hql"> <glossterm><abbrev xlink:href="http://docs.jboss.org/hibernate/orm/4.1/manual/en-US/html/ch16.html" xml:id="abbr_Hql">HQL</abbrev></glossterm> <glossdef> <para>The <link xlink:href="http://docs.jboss.org/hibernate/orm/3.3/reference/en/html/queryhql.html">Hibernate Query Language</link>.</para> </glossdef> </glossentry> <glossentry xml:id="gloss_http"> <glossterm><abbrev xlink:href="http://www.w3.org/Protocols" xml:id="abbr_Http">http</abbrev></glossterm> <glossdef> <para>The Hypertext Transfer Protocol</para> </glossdef> </glossentry> <glossentry xml:id="gloss_IDE"> <glossterm><abbrev xlink:href="http://en.wikipedia.org/wiki/Integrated_development_environment" xml:id="abbr_Ide">IDE</abbrev></glossterm> <glossdef> <para>Integrated Development Environment</para> </glossdef> </glossentry> <glossentry xml:id="gloss_J2EE"> <glossterm><trademark xlink:href="http://www.oracle.com/technetwork/java/javaee" xml:id="tm_J2ee">J2EE</trademark></glossterm> <glossdef> <para>Java Platform, Enterprise Edition</para> </glossdef> </glossentry> <glossentry xml:id="gloss_Java"> <glossterm><trademark xlink:href="http://www.oracle.com/us/legal/third-party-trademarks/index.html">Java</trademark></glossterm> <glossdef> <para>General purpose programming language with support for object oriented concepts.</para> </glossdef> </glossentry> <glossentry xml:id="gloss_Javadoc"> <glossterm><trademark xlink:href="http://docs.oracle.com/javase/1.5.0/docs/guide/javadoc">Javadoc</trademark></glossterm> <glossdef> <para>Extracting documentation embedded in <link linkend="gloss_Java"><trademark>Java</trademark></link> source code.</para> </glossdef> </glossentry> <glossentry xml:id="gloss_JDBC"> <glossterm><trademark xlink:href="http://www.oracle.com/technetwork/java/javase/jdbc" xml:id="tm_Jdbc">JDBC</trademark></glossterm> <glossdef> <para>XXX.</para> </glossdef> </glossentry> <glossentry xml:id="gloss_JDK"> <glossterm><trademark xlink:href="http://www.oracle.com/technetwork/java/javase" xml:id="tm_Jdk">JDK</trademark></glossterm> <glossdef> <para>Java Development Kit.</para> </glossdef> </glossentry> <glossentry xml:id="gloss_JPA"> <glossterm><abbrev xlink:href="http://www.javaworld.com/javaworld/jw-01-2008/jw-01-jpa1.html" xml:id="abbr_Jpa">JPA</abbrev></glossterm> <glossdef> <para><link xlink:href="http://www.javaworld.com/javaworld/jw-01-2008/jw-01-jpa1.html">Java Persistence Architecture</link></para> </glossdef> </glossentry> <glossentry xml:id="gloss_JRE"> <glossterm><trademark xlink:href="http://www.oracle.com/technetwork/java/javase" xml:id="tm_Jre">JRE</trademark></glossterm> <glossdef> <para>Java Runtime Environment</para> </glossdef> </glossentry> <glossentry xml:id="gloss_MathML"> <glossterm><abbrev>MathML</abbrev></glossterm> <glossdef> <para><link xlink:href="http://www.w3.org/Math">Mathematical Markup Language</link></para> </glossdef> </glossentry> <glossentry xml:id="gloss_MIB"> <glossterm><orgname xlink:href="http://www.mi.hdm-stuttgart.de" xml:id="org_Mib">MIB</orgname></glossterm> <glossdef> <para xml:lang="de">Bachelor Studiengang Medieninformatik</para> </glossdef> </glossentry> <glossentry xml:id="gloss_Mysql"> <glossterm><trademark xlink:href="http://www.mysql.com/about/legal/trademark.html" xml:id="tm_Mysql">Mysql</trademark></glossterm> <glossdef> <para>Open source Oracle database product</para> </glossdef> </glossentry> <glossentry xml:id="gloss_MP3"> <glossterm><abbrev>MP3</abbrev></glossterm> <glossdef> <para>Audio codec.</para> </glossdef> </glossentry> <glossentry xml:id="gloss_ORM"> <glossterm><abbrev>ORM</abbrev></glossterm> <glossdef> <para>Object relational mapping.</para> </glossdef> </glossentry> <glossentry xml:id="gloss_PHP"> <glossterm><abbrev xlink:href="http://www.php.net">PHP</abbrev></glossterm> <glossdef> <para>Hypertext preprocessor</para> </glossdef> </glossentry> <glossentry xml:id="gloss_RelaxNG"> <glossterm><acronym xlink:href="http://relaxng.org">RelaxNG</acronym></glossterm> <glossdef> <para>An <link xlink:href="http://standards.iso.org/ittf/PubliclyAvailableStandards/c037605_ISO_IEC_19757-2_2003(E).zip">ISO</link> standard to define the grammar of XML documents. Primary use for document oriented applications.</para> </glossdef> </glossentry> <glossentry xml:id="gloss_SAX"> <glossterm><acronym xlink:href="http://www.saxproject.org">SAX</acronym></glossterm> <glossdef> <para><link xlink:href="http://www.saxproject.org">Simple API for XML</link>.</para> </glossdef> </glossentry> <glossentry xml:id="gloss_SQL"> <glossterm><acronym xlink:href="http://en.wikipedia.org/wiki/Sql">SQL</acronym></glossterm> <glossdef> <para><link xlink:href="http://en.wikipedia.org/wiki/SQL">Structured query language</link>.</para> </glossdef> </glossentry> <glossentry xml:id="gloss_SVG"> <glossterm><abbrev>SVG</abbrev></glossterm> <glossdef> <para><link xlink:href="http://www.w3.org/Graphics/SVG">Scalable Vector Graphics</link>.</para> </glossdef> </glossentry> <glossentry xml:id="gloss_TCP"> <glossterm><acronym xlink:href="http://en.wikipedia.org/wiki/Transmission_Control_Protocol" xml:id="abbr_Tcp">TCP</acronym></glossterm> <glossdef> <para>Transmission Control Protocol</para> </glossdef> </glossentry> <glossentry xml:id="gloss_URL"> <glossterm><abbrev xlink:href="http://www.ietf.org/rfc/rfc1738.txt" xml:id="abbr_Url">URL</abbrev></glossterm> <glossdef> <para>Uniform Resource Locator</para> </glossdef> </glossentry> <glossentry xml:id="gloss_W3C"> <glossterm><orgname xlink:href="http://www.w3.org">W3C</orgname></glossterm> <glossdef> <para>World Wide Web Consortium</para> </glossdef> </glossentry> <glossentry xml:id="gloss_XHTML"> <glossterm><abbrev>XHTML</abbrev></glossterm> <glossdef> <para>Html as <link linkend="gloss_XML">XML</link> <link xlink:href="http://www.w3.org/TR/xhtml11">standard</link>.</para> </glossdef> </glossentry> <glossentry xml:id="gloss_XML"> <glossterm><abbrev xlink:href="http://www.w3.org/XML">Xml</abbrev></glossterm> <glossdef> <para>The <link xlink:href="http://www.w3.org/XML">Extensible Markup Language</link>.</para> </glossdef> </glossentry> <glossentry xml:id="gloss_XmlSchema"> <glossterm>XML Schema</glossterm> <glossdef> <para>A W3C standard to define grammars for XML documents. Rich set of features with respect to data modeling.</para> </glossdef> </glossentry> <glossentry xml:id="gloss_XPath"> <glossterm><acronym xlink:href="http://www.w3.org/TR/xpath" xml:id="abbr_Xpath">XPath</acronym></glossterm> <glossdef> <para>XML Path Language</para> </glossdef> </glossentry> <glossentry xml:id="gloss_XSD"> <glossterm><abbrev xlink:href="http://www.w3.org/Style/XSL">XSD</abbrev></glossterm> <glossdef> <para>XML Schema description Language</para> </glossdef> </glossentry> <glossentry xml:id="gloss_XSL"> <glossterm><abbrev xlink:href="http://www.w3.org/Style/XSL" xml:id="abbr_Xsl">XSL</abbrev></glossterm> <glossdef> <para>Extensible Stylesheet Language</para> </glossdef> </glossentry> </glossary> </appendix> <xi:include href="../glossary.xml" xpointer="element(/1)"/> <xi:include href="../bibliography.xml" xpointer="element(/1)"/> </part>