XmlFormatter is an open source Python class, who provides formatting of XML documents. This formatter differs from others by handling whitespaces by a distnict set of formatting rules (see below) - thinking element content as objects and mixed content as a written text. But formatting is suspended for elements marked as preserve. You might find it most useful for tasks involving corrections or presentations. Typical usage often looks like this::
from xmlformatter import Formatter
formatter = Formatter(indent="4")
print formatter.format_file("/home/pa/doc.xml")
The Object Style reflects the storage of object properties. Therefore all surrounding whitespaces are removed, sequences of whitespaces are collapsed::
< complex >
< real > 4.4E+12< /real >
< imaginary >5.4E-11
< /imaginary >
< /complex >
The following shows the the XML document formatted by Object Style::
< complex >
< real >4.4E+12< /real >
< imaginary >5.4E-11< /imaginary >
< /complex >
The Text Style reflects the storage of a written text. Text is expected within mixed content. Therfore leading and trailing whitespaces are put from text nodes in nested elements to surrounding text nodes. Note: If no text node can be found, xmlformatter inserts a text node containing a single whitespace out of the nested element. Sequences of whitespaces are collapsed to a single::
< poem > Es< em > war< /em > einmal und < em >ist < /em >nicht mehr...< /poem >
The nested elements handled like object properties, but whitespaces are merged with text nodes instead of being removed:::
< poem >Es < em >war< /em > einmal und < em >ist< /em > nicht mehr...< /poem >
Both styles are used together in a XML documents. The formatting rules are:
A: surrounding whitespaces are removed from element content
B: leading whitespaces are removed from element content
C: trailing whitespaces are removed from element content
D: leading whitespaces in nested elements are put to preceding text node (or inserted) within mixed content
E: trailing whitespaces in nested elements are put to following text nodes (or inserted) within mixed content
F: sequences of whitespaces (n>0) are replaced by a single blank " " within element and mixed content
G: linebreak and whitespace indents elements within elements content
The following example marks the described whitespaces by their labels within a XML document::
AAAA
AAAABBBB4.4E+12CCC< /number >AAAA
AAAA< poem >BBBBEs< em >DDDDwar< /em > einmal und < em >istEEEE< /em >nicht mehrF
FFFFein < strong >riesengroßer< /strong >< em >DDDDTeddybär< /em >,F
der aßFFFFdie < em >MilchEEEE< /em >und trank das BrotFFFF
und als er starb da < strong >war erEEEE< /strong >< em >tot< /em >.CCCC< /poem >AAAA
< /root >
The following shows the formatted XML document: All whitespaces replaced by a single blank.:
< root >
< number >4.4E+12< /number >
< poem >Es < em >war< /em > einmal und < em >ist< /em > nicht mehr ein < strong >riesengroßer< /strong > < em >Teddybär< /em >, der aß die < em >Milch< /em >und trank das Brot und als er starb da < strong >war er< /strong > < em >tot< /em >.< /poem >< /root >
Options
Formatting can be influenced by a lot of parameters, while construction of XmlFormatter object. Elements that will left unformatted are given in a list of element names, called preserve.
All descendants of preserved elements are left unformatted also.:
from xmlformatter import Formatter
formatter = xmlformatter.Formatter(preserving=["preserve"])
print format.format_file("/home/pa/doc.xml")
The indenting can be raised by indent (default 2). The indenting character can be set by indentChar.
from xmlformatter import Formatter
formatter = Formatter(indent="1", indentChar="\t")
print formatter.format_file("/home/pa/doc.xml")
Indenting can be suppressed by setting compressed to true or choosing indent = 0.:
from xmlformatter import Formatter
formatter = Formatter(compress=True)
print formatter.format_file("/home/pa/doc.xml")
The encoding of the formatted document can be set by encoding_input. By default encoding is UTF-8 or read from xml declaration. The encoding of the output can be set by encoding_output. are:
from xmlformatter import Formatter
formatter = Formatter(encoding_input="ISO-8859-1", encoding_output="ISO-8859-1")
print formatter.format_file("/home/pa/doc.xml")
Methods
Xmlformatter can parse XML documents given by path or string.:
from xmlformatter import Formatter
formatter = Formatter()
# file
print formatter.format_file("/home/pa/doc.xml")
# string
formatted = formatter.format_string("< root >XML document< /root >")
xmlformat.py
XmlFormatter includes a command line tool, xmlformat.py, for wrapping XmlFormatter class. The parameters are named like the options::
xmlformat [--preserve "pre,literal"] [--compress] [--indent num] [--outfile file] [--encoding enc] [--outencoding enc] [--help] < --infile file|file >
xmlformat.py can read from STDIN, like::
cat /home/pa/doc.xml | python xmlformat.py
Note
XmlFormatter is build on top of the expat parser, and therefore limited by expat. XmlFormatter is published under MIT license.
Product's homepage
Requirements:
· Python