Dbtotexi


Node:Introduction, Next:, Up:Top

Introduction

This document describes dbtotexi, a simple utility for converting XML documents that conform to a subset of the DocBook DTD into GNU texinfo format. The dbtotexi program is implemented using the XSL Transformations language as described in the working document http://www.w3.org/TR/1999/WD-xslt-19990421. A Java based XSL engine1 carries out the actual transformation as determined by the style sheet dbtotexi.xsl. A small amount of additional Java code provides a few utility routines not provided by the XSL implementation.


Node:License, Next:, Previous:Introduction, Up:Top

License

This software is subject to the terms of the GNU General Public License. Please see the file COPYING for details. The license terms that apply to the supplied third party software contained in the files sax.jar, xp.jar and xt.jar are specified in the files sax-copying.txt, xp-copying.txt and xt-copying.txt respectively.


Node:Installation, Next:, Previous:License, Up:Top

Installation

Once the tar archive has been unpacked2, check the Makefile to see if the settings at the top are suitable for your site and then just type make and make install. By default, the dbtotexi bash shell script goes into /usr/local/bin and the support files into /usr/local/share/dbtotexi. A compiled version of the Java support code is supplied so that you do not need a Java compiler unless you change the Java code.

The installation defaults to using Sun's jre VM but any JDK 1.1 compliant implementation (such as Kaffe3) should work. No GUI facilities or additional libraries are required. If you use a different VM then the shell script, dbtotexi.sh may need editing.


Node:Usage, Next:, Previous:Installation, Up:Top

Usage

A DocBook source file, foo.xml, is converted to texinfo format very simply:

dbtotexi foo.xml

Will produce output in foo.texinfo. The name of the output file can be explicitly specified as a second argument. If the output file name is specified as -, the output is sent to stdout. A third argument will specify the name of the info file to produce, this defaults to the input file name modified to have a .info suffix. Any DocBook elements that are not recognised (due to either an error in the input document or because the translator does not yet support a translation for that element) are reported to stderr and shown in the output in bold.

A document that conforms to the SGML DocBook DTD must first be converted to XML before it can be processed by dbtotexi. This can be done using the sx program that is part of James Clark's SP SGML toolset. Typical usage would be:

sx -xlower foo.sgm > foo.xml

Note

The XML version of the DocBook DTD is not actually required by the conversion process (but see sec_texinfopi). In fact, if the document to be converted doesn't contain a DOCTYPE declaration then the conversion process is somewhat quicker. Irrespective of whether the document contains a DOCTYPE declaration, it should be valid (i.e. it conforms to the DocBook XML DTD).


Node:Role Attributes, Next:, Previous:Usage, Up:Top

Role Attributes

This section describes how the translation of some the elements are influenced by the setting of the element's role attribute.


indexterm
The role attribute can be set to one of c, f, v, k, p and d to indicate which index the entry should be entered in. If the role attribute is not specified the entry will be entered into the concept index by default.
index
The role attribute can be set to one of c, f, v, k, p and d to indicate which index should be output. If the role attribute is not specified the concept index will be output by default.
variablelist
The role attribute can be set to one of bold or fixed to indicate that the list's terms should be displayed in bold or fixed-width font respectively. If the role attribute is not specified, the list's terms be displayed "as is".


Node:sec_texinfopi, Next:, Previous:Role Attributes, Up:Top

The texinfo Processing Instruction

The texinfo processing instruction can be used within a document to insert arbitrary markup into the output. The characters @, { and } are not escaped. This facility can be used to define entities that contain texinfo markup. For example, given that the following general entity declaration is placed in the DTD subset:

<!ENTITY hellip "<?texinfo @dots{}?>">

One can write &hellip; and expect to get dots...!


Node:sec_dirpi, Next:, Previous:sec_texinfopi, Up:Top

The dircategory & direntry Processing Instructions

The dircategory and direntry processing instructions may be used to set the resulting info file's directory category and menu entry. These processing instructions are best positioned after the document type declaration but before the first element (<book> or <article>). Here's what this document uses:

<?dircategory Texinfo documentation system?>
<?direntry * Dbtotexi: (dbtotexi). DocBook to Texinfo convertor.?>


Node:sec_unicode, Next:, Previous:sec_dirpi, Up:Top

Support for Unicode Characters

A few Unicode characters are recognised in element content and converted into the equivalent texinfo command. Unrecognised Unicode characters are passed through unchanged. Norman Walsh's DocBook XML DTD defines the ISO entity set in terms of Unicode characters. app_unicode lists the set of Unicode characters that are currently recognised.


Node:Caveats, Next:, Previous:sec_unicode, Up:Top

Caveats

A couple of points should be born in mind:

  1. Only a small subset of the DocBook DTD has currently been implemented. Furthermore, of the elements that have been implemented, most of their attributes are ignored. As time goes by, the implementation will become more complete. However, some features of DocBook may never be implemented due to limitatations in the texinfo format and some features of DocBook may never be implemented because they are not considered useful enough. All contributions are welcome. Please send contributions and bug reports to markb@ordern.com.
  2. The XSL Transformations language has not yet been standardised and, therefore, applications that use it are subject to change. I envisage having to modify the XSL script to track the development of XSL and its implementations.
  3. It is possible that some existing SGML documents may require modification before they can be successfully converted to XML and hence into texinfo.


Node:Links, Previous:Caveats, Up:Top

Links

More information can be found from these links:


http://www.w3.org/TR/WD-xslt
The latest version of the XSL Transformations (XSLT) Specification.
http://www.jclark.com/
James Clark's website contains much useful stuff including the XSLT engine xt and the SP toolset.
http://nwalsh.com/
Norman Walsh's website contains lots of DocBook and XML/XSL related stuff.
http://www.kaffe.org/
Home of the "Open Source" Kaffe Java VM.


Node:app_unicode, Up:Top

Recognised Unicode Characters

The following table lists the set of Unicode characters that are currently recognised. The name of the XML entity that yields each character is also listed.

Unicode Character Rendered As Entity Name
00a0 nbsp
00a1 ¡ iexcl
00a3 £ pound
00a9 © copy
00bf ¿ iquest
00c6 Æ AElig
00df ß szlig
00e6 æ aelig
2022 bull
2026 ... hellip

0131 i inodot

00a8 ¨ uml
00e4 ä auml
00c4 Ä Auml
00eb ë euml
00cb Ë Euml
00ef ¨i iuml
00cf Ï Iuml
00f6 ö ouml
00d6 Ö Ouml
00fc ü uuml
00dc Ü Uuml
00ff ÿ yuml
0178 ¨Y Yuml

00b4 ´ acute
00e1 á aacute
00c1 Á Aacute
00e9 é eacute
00c9 É Eacute
00ed ´i iacute
00cd Í Iacute
00f3 ó oacute
00d3 Ó Oacute
00fa ú uacute
00da Ú Uacute
00fd ý yacute
00dd Ý Yacute
0107 ´c cacute
0106 ´C Cacute
01f5 ´g gacute
013a ´l lacute
0139 ´L Lacute
0144 ´n nacute
0143 ´N Nacute
0155 ´r racute
0154 ´R Racute
015b ´s sacute
015a ´S Sacute
017a ´z zacute
0179 ´Z Zacute

00b8 ¸ cedil
00e7 ç ccedil
00c7 Ç Ccedil
0122 ¸G Gcedil
0137 ¸k kcedil
0136 ¸K Kcedil
013c' ¸l lcedil
013b ¸L Lcedil
0146 ¸n ncedil
0145 ¸N Ncedil
0157 ¸r rcedil
0156 ¸R Rcedil
015f ¸s scedil
015e ¸S Scedil
0163 ¸t tcedil
0162 ¸T Tcedil

00af ¯ macr
0101 amacr
0100 Amacr
0113 emacr
0112 Emacr
012a Imacr
012b imacr
014c Omacr
014d omacr
016b umacr
016a Umacr

00e2 â acirc
00c2 Â Acirc
00ea ê ecirc
00cA Ê Ecirc
00ee ^i icirc
00ce Î Icirc
00f4 ô ocirc
00d4 Ô Ocirc
00db û ucirc
00fb Û Ucirc
0109 ^c ccirc
0108 ^C Ccirc
011d ^g gcirc
011c ^G Gcirc
0125 ^h hcirc
0124 ^H Hcirc
0135 ^j jcirc
0134 ^J Jcirc
015d ^s scirc
015c ^S Scirc
0175 ^w wcirc
0174 ^W Wcirc
0177 ^y ycirc
0176 ^Y Ycirc

00e0 à agrave
00c0 À Agrave
00e8 è egrave
00c8 È Egrave
00ec `i igrave
00cc Ì Igrave
00f2 ò ograve
00d2 Ò Ograve
00f9 ù ugrave
00d9 Ù Ugrave

00e3 ã atilde
00c3 Ã Atilde
00f1 ñ ntilde
00d1 ~N Ntilde
00f5 õ otilde
00d5 Õ Otilde
0129 ~i itilde
0128 ~I Itilde
0169 ~u utilde
0168 ~U Utilde

Table of Contents


Footnotes

  1. Currently, I am using James Clark's xt.

  2. You must have done that already to be reading this!

  3. http://www.kaffe.org/