org.apache.xml.serialize

Class HTMLSerializer

Implemented Interfaces:
ContentHandler, DeclHandler, DocumentHandler, DTDHandler, LexicalHandler, DOMSerializer, Serializer
Known Direct Subclasses:
XHTMLSerializer

public class HTMLSerializer
extends BaseMarkupSerializer

Implements an HTML/XHTML serializer supporting both DOM and SAX pretty serializing. HTML/XHTML mode is determined in the constructor. For usage instructions see Serializer.

If an output stream is used, the encoding is taken from the output format (defaults to UTF-8). If a writer is used, make sure the writer uses the same encoding (if applies) as specified in the output format.

The serializer supports both DOM and SAX. DOM serializing is done by calling HTMLSerializer and SAX serializing is done by firing SAX events and using the serializer as a document handler.

If an I/O exception occurs while serializing, the serializer will not throw an exception directly, but only throw it at the end of serializing (either DOM or SAX's org.xml.sax.DocumentHandler.endDocument.

For elements that are not specified as whitespace preserving, the serializer will potentially break long text lines at space boundaries, indent lines, and serialize elements on separate lines. Line terminators will be regarded as spaces, and spaces at beginning of line will be stripped.

XHTML is slightly different than HTML:

Version:
$Revision: 464300 $ $Date: 2006-10-15 17:56:26 -0400 (Sun, 15 Oct 2006) $
Author:
Assaf Arkin
See Also:
Serializer

Field Summary

static String
XHTMLNamespace

Fields inherited from class org.apache.xml.serialize.BaseMarkupSerializer

_docTypePublicId, _docTypeSystemId, _encodingInfo, _format, _indenting, _prefixes, _printer, _started, fCurrentNode, fDOMError, fDOMErrorHandler, fDOMFilter, fStrBuffer, features

Constructor Summary

HTMLSerializer()
Constructs a new serializer.
HTMLSerializer(OutputStream output, OutputFormat format)
Constructs a new serializer that writes to the specified output stream using the specified output format.
HTMLSerializer(Writer writer, OutputFormat format)
Constructs a new serializer that writes to the specified writer using the specified output format.
HTMLSerializer(boolean xhtml, OutputFormat format)
Constructs a new HTML/XHTML serializer depending on the value of xhtml.
HTMLSerializer(OutputFormat format)
Constructs a new serializer.

Method Summary

protected void
characters(String text)
Called to print the text contents in the prevailing element format.
void
characters(char[] chars, int start, int length)
void
endElement(String tagName)
void
endElement(String namespaceURI, String localName, String rawName)
void
endElementIO(String namespaceURI, String localName, String rawName)
protected String
escapeURI(String uri)
protected String
getEntityRef(int ch)
Returns the suitable entity reference for this character value, or null if no such entity exists.
protected void
serializeElement(Element elem)
Called to serialize a DOM element.
void
setOutputFormat(OutputFormat format)
Specifies an output format for this serializer.
void
setXHTMLNamespace(String newNamespace)
protected void
startDocument(String rootTagName)
Called to serialize the document's DOCTYPE by the root element.
void
startElement(String tagName, AttributeList attrs)
void
startElement(String namespaceURI, String localName, String rawName, Attributes attrs)

Methods inherited from class org.apache.xml.serialize.BaseMarkupSerializer

asContentHandler, asDOMSerializer, asDocumentHandler, attributeDecl, characters, characters, checkUnboundNamespacePrefixedNode, comment, comment, content, elementDecl, endCDATA, endDTD, endDocument, endEntity, endNonEscaping, endPrefixMapping, endPreserving, enterElementState, externalEntityDecl, fatalError, getElementState, getEntityRef, getPrefix, ignorableWhitespace, internalEntityDecl, isDocumentState, leaveElementState, modifyDOMError, notationDecl, prepare, printCDATAText, printDoctypeURL, printEscaped, printEscaped, printText, printText, processingInstruction, processingInstructionIO, reset, serialize, serialize, serialize, serializeElement, serializeNode, serializePreRoot, setDocumentLocator, setOutputByteStream, setOutputCharStream, setOutputFormat, skippedEntity, startCDATA, startDTD, startDocument, startEntity, startNonEscaping, startPrefixMapping, startPreserving, surrogates, unparsedEntityDecl

Field Details

XHTMLNamespace

public static final String XHTMLNamespace

Constructor Details

HTMLSerializer

public HTMLSerializer()
Constructs a new serializer. The serializer cannot be used without calling HTMLSerializer or HTMLSerializer first.

HTMLSerializer

public HTMLSerializer(OutputStream output,
                      OutputFormat format)
Constructs a new serializer that writes to the specified output stream using the specified output format. If format is null, will use a default output format.
Parameters:
output - The output stream to use
format - The output format to use, null for the default

HTMLSerializer

public HTMLSerializer(Writer writer,
                      OutputFormat format)
Constructs a new serializer that writes to the specified writer using the specified output format. If format is null, will use a default output format.
Parameters:
writer - The writer to use
format - The output format to use, null for the default

HTMLSerializer

protected HTMLSerializer(boolean xhtml,
                         OutputFormat format)
Constructs a new HTML/XHTML serializer depending on the value of xhtml. The serializer cannot be used without calling HTMLSerializer or HTMLSerializer first.
Parameters:
xhtml - True if XHTML serializing

HTMLSerializer

public HTMLSerializer(OutputFormat format)
Constructs a new serializer. The serializer cannot be used without calling HTMLSerializer or HTMLSerializer first.

Method Details

characters

protected void characters(String text)
            throws IOException
Called to print the text contents in the prevailing element format. Since this method is capable of printing text as CDATA, it is used for that purpose as well. White space handling is determined by the current element state. In addition, the output format can dictate whether the text is printed as CDATA or unescaped.
Overrides:
characters in interface BaseMarkupSerializer
Parameters:
text - The text to print

characters

public void characters(char[] chars,
                       int start,
                       int length)
            throws SAXException
Overrides:
characters in interface BaseMarkupSerializer

endElement

public void endElement(String tagName)
            throws SAXException

endElement

public void endElement(String namespaceURI,
                       String localName,
                       String rawName)
            throws SAXException

endElementIO

public void endElementIO(String namespaceURI,
                         String localName,
                         String rawName)
            throws IOException

escapeURI

protected String escapeURI(String uri)

getEntityRef

protected String getEntityRef(int ch)
Returns the suitable entity reference for this character value, or null if no such entity exists. Calling this method with '&' will return "&".
Overrides:
getEntityRef in interface BaseMarkupSerializer
Parameters:
ch - Character value
Returns:
Character entity name, or null

serializeElement

protected void serializeElement(Element elem)
            throws IOException
Called to serialize a DOM element. Equivalent to calling startElement, endElement and serializing everything inbetween, but better optimized.
Overrides:
serializeElement in interface BaseMarkupSerializer

setOutputFormat

public void setOutputFormat(OutputFormat format)
Specifies an output format for this serializer. It the serializer has already been associated with an output format, it will switch to the new format. This method should not be called while the serializer is in the process of serializing a document.
Specified by:
setOutputFormat in interface Serializer
Overrides:
setOutputFormat in interface BaseMarkupSerializer
Parameters:
format - The output format to use

setXHTMLNamespace

public void setXHTMLNamespace(String newNamespace)

startDocument

protected void startDocument(String rootTagName)
            throws IOException
Called to serialize the document's DOCTYPE by the root element. The document type declaration must name the root element, but the root element is only known when that element is serialized, and not at the start of the document.

This method will check if it has not been called before (BaseMarkupSerializer._started), will serialize the document type declaration, and will serialize all pre-root comments and PIs that were accumulated in the document (see HTMLSerializer). Pre-root will be serialized even if this is not the first root element of the document.


startElement

public void startElement(String tagName,
                         AttributeList attrs)
            throws SAXException

startElement

public void startElement(String namespaceURI,
                         String localName,
                         String rawName,
                         Attributes attrs)
            throws SAXException

Copyright B) 1999-2007 The Apache Software Foundation. All Rights Reserved.