XmlFormatter 0.1.4

Format, compress XML documents

  Add it to your Download Basket!

 Add it to your Watch List!


Rate it!
send us
an update
MIT/X Consortium License 
P. Andreas Moeller
ROOT \ Internet \ HTTP (WWW)
XmlFormatter is an open source Python class, who provides formatting of XML documents. This formatter differs from others by handling whitespaces by a distnict set of formatting rules (see below) - thinking element content as objects and mixed content as a written text. But formatting is suspended for elements marked as preserve. You might find it most useful for tasks involving corrections or presentations. Typical usage often looks like this::

from xmlformatter import Formatter

formatter = Formatter(indent="4")
print formatter.format_file("/home/pa/doc.xml")

The Object Style reflects the storage of object properties. Therefore all surrounding whitespaces are removed, sequences of whitespaces are collapsed::

< complex >
 < real > 4.4E+12< /real >
 < imaginary >5.4E-11
 < /imaginary >
< /complex >

The following shows the the XML document formatted by Object Style::

< complex >
 < real >4.4E+12< /real >
 < imaginary >5.4E-11< /imaginary >
< /complex >

The Text Style reflects the storage of a written text. Text is expected within mixed content. Therfore leading and trailing whitespaces are put from text nodes in nested elements to surrounding text nodes. Note: If no text node can be found, xmlformatter inserts a text node containing a single whitespace out of the nested element. Sequences of whitespaces are collapsed to a single::

 < poem > Es< em > war< /em > einmal und < em >ist < /em >nicht mehr...< /poem >

The nested elements handled like object properties, but whitespaces are merged with text nodes instead of being removed:::

 < poem >Es < em >war< /em > einmal und < em >ist< /em > nicht mehr...< /poem >

Both styles are used together in a XML documents. The formatting rules are:

A: surrounding whitespaces are removed from element content

B: leading whitespaces are removed from element content

C: trailing whitespaces are removed from element content

D: leading whitespaces in nested elements are put to preceding text node (or inserted) within mixed content

E: trailing whitespaces in nested elements are put to following text nodes (or inserted) within mixed content

F: sequences of whitespaces (n>0) are replaced by a single blank " " within element and mixed content

G: linebreak and whitespace indents elements within elements content

The following example marks the described whitespaces by their labels within a XML document::

AAAABBBB4.4E+12CCC< /number >AAAA
AAAA< poem >BBBBEs< em >DDDDwar< /em > einmal und < em >istEEEE< /em >nicht mehrF
FFFFein < strong >riesengroßer< /strong >< em >DDDDTeddybär< /em >,F
der aßFFFFdie < em >MilchEEEE< /em >und trank das BrotFFFF
und als er starb da < strong >war erEEEE< /strong >< em >tot< /em >.CCCC< /poem >AAAA
< /root >

The following shows the formatted XML document: All whitespaces replaced by a single blank.:

< root >
 < number >4.4E+12< /number >
 < poem >Es < em >war< /em > einmal und < em >ist< /em > nicht mehr ein < strong >riesengroßer< /strong > < em >Teddybär< /em >, der aß die < em >Milch< /em >und trank das Brot und als er starb da < strong >war er< /strong > < em >tot< /em >.< /poem >< /root >


Formatting can be influenced by a lot of parameters, while construction of XmlFormatter object. Elements that will left unformatted are given in a list of element names, called preserve.

 All descendants of preserved elements are left unformatted also.:

 from xmlformatter import Formatter

 formatter = xmlformatter.Formatter(preserving=["preserve"])
 print format.format_file("/home/pa/doc.xml")

The indenting can be raised by indent (default 2). The indenting character can be set by indentChar.

from xmlformatter import Formatter

formatter = Formatter(indent="1", indentChar="\t")
print formatter.format_file("/home/pa/doc.xml")

Indenting can be suppressed by setting compressed to true or choosing indent = 0.:

from xmlformatter import Formatter

formatter = Formatter(compress=True)
print formatter.format_file("/home/pa/doc.xml")

The encoding of the formatted document can be set by encoding_input. By default encoding is UTF-8 or read from xml declaration. The encoding of the output can be set by encoding_output. are:

from xmlformatter import Formatter

formatter = Formatter(encoding_input="ISO-8859-1", encoding_output="ISO-8859-1")
print formatter.format_file("/home/pa/doc.xml")


Xmlformatter can parse XML documents given by path or string.:

from xmlformatter import Formatter

formatter = Formatter()
# file
print formatter.format_file("/home/pa/doc.xml")
# string
formatted = formatter.format_string("< root >XML document< /root >")


XmlFormatter includes a command line tool, xmlformat.py, for wrapping XmlFormatter class. The parameters are named like the options::

xmlformat [--preserve "pre,literal"] [--compress] [--indent num] [--outfile file] [--encoding enc] [--outencoding enc] [--help] < --infile file|file >

xmlformat.py can read from STDIN, like::

 cat /home/pa/doc.xml | python xmlformat.py


XmlFormatter is build on top of the expat parser, and therefore limited by expat. XmlFormatter is published under MIT license.

Last updated on March 9th, 2012


#XML documents #XML formatter #XML compressor #format #compress #XML #documents

Add your review!