XmlFormatter

0.1.4 MIT/X Consortium License    
  UNRATED

  214 downloads

Format, compress XML documents

description

download

specs

XmlFormatter is an open source Python class, who provides formatting of XML documents. This formatter differs from others by handling whitespaces by a distnict set of formatting rules (see below) - thinking element content as objects and mixed content as a written text. But formatting is suspended for elements marked as preserve. You might find it most useful for tasks involving corrections or presentations. Typical usage often looks like this::

from xmlformatter import Formatter

formatter = Formatter(indent="4")
print formatter.format_file("/home/pa/doc.xml")


The Object Style reflects the storage of object properties. Therefore all surrounding whitespaces are removed, sequences of whitespaces are collapsed::

< complex >
 < real > 4.4E+12< /real >
 < imaginary >5.4E-11
 < /imaginary >
< /complex >


The following shows the the XML document formatted by Object Style::

< complex >
 < real >4.4E+12< /real >
 < imaginary >5.4E-11< /imaginary >
< /complex >


The Text Style reflects the storage of a written text. Text is expected within mixed content. Therfore leading and trailing whitespaces are put from text nodes in nested elements to surrounding text nodes. Note: If no text node can be found, xmlformatter inserts a text node containing a single whitespace out of the nested element. Sequences of whitespaces are collapsed to a single::

 < poem > Es< em > war< /em > einmal und < em >ist < /em >nicht mehr...< /poem >

The nested elements handled like object properties, but whitespaces are merged with text nodes instead of being removed:::

 < poem >Es < em >war< /em > einmal und < em >ist< /em > nicht mehr...< /poem >

Both styles are used together in a XML documents. The formatting rules are:

A: surrounding whitespaces are removed from element content

B: leading whitespaces are removed from element content

C: trailing whitespaces are removed from element content

D: leading whitespaces in nested elements are put to preceding text node (or inserted) within mixed content

E: trailing whitespaces in nested elements are put to following text nodes (or inserted) within mixed content

F: sequences of whitespaces (n>0) are replaced by a single blank " " within element and mixed content

G: linebreak and whitespace indents elements within elements content

The following example marks the described whitespaces by their labels within a XML document::

AAAA
AAAABBBB4.4E+12CCC< /number >AAAA
AAAA< poem >BBBBEs< em >DDDDwar< /em > einmal und < em >istEEEE< /em >nicht mehrF
FFFFein < strong >riesengroßer< /strong >< em >DDDDTeddybär< /em >,F
der aßFFFFdie < em >MilchEEEE< /em >und trank das BrotFFFF
und als er starb da < strong >war erEEEE< /strong >< em >tot< /em >.CCCC< /poem >AAAA
< /root >


The following shows the formatted XML document: All whitespaces replaced by a single blank.:

< root >
 < number >4.4E+12< /number >
 < poem >Es < em >war< /em > einmal und < em >ist< /em > nicht mehr ein < strong >riesengroßer< /strong > < em >Teddybär< /em >, der aß die < em >Milch< /em >und trank das Brot und als er starb da < strong >war er< /strong > < em >tot< /em >.< /poem >< /root >


Options

Formatting can be influenced by a lot of parameters, while construction of XmlFormatter object. Elements that will left unformatted are given in a list of element names, called preserve.

 All descendants of preserved elements are left unformatted also.:

 from xmlformatter import Formatter

 formatter = xmlformatter.Formatter(preserving=["preserve"])
 print format.format_file("/home/pa/doc.xml")


The indenting can be raised by indent (default 2). The indenting character can be set by indentChar.

from xmlformatter import Formatter

formatter = Formatter(indent="1", indentChar="\t")
print formatter.format_file("/home/pa/doc.xml")


Indenting can be suppressed by setting compressed to true or choosing indent = 0.:

from xmlformatter import Formatter

formatter = Formatter(compress=True)
print formatter.format_file("/home/pa/doc.xml")


The encoding of the formatted document can be set by encoding_input. By default encoding is UTF-8 or read from xml declaration. The encoding of the output can be set by encoding_output. are:

from xmlformatter import Formatter

formatter = Formatter(encoding_input="ISO-8859-1", encoding_output="ISO-8859-1")
print formatter.format_file("/home/pa/doc.xml")


Methods

Xmlformatter can parse XML documents given by path or string.:

from xmlformatter import Formatter

formatter = Formatter()
# file
print formatter.format_file("/home/pa/doc.xml")
# string
formatted = formatter.format_string("< root >XML document< /root >")


xmlformat.py

XmlFormatter includes a command line tool, xmlformat.py, for wrapping XmlFormatter class. The parameters are named like the options::

xmlformat [--preserve "pre,literal"] [--compress] [--indent num] [--outfile file] [--encoding enc] [--outencoding enc] [--help] < --infile file|file >


xmlformat.py can read from STDIN, like::

 cat /home/pa/doc.xml | python xmlformat.py

Note


XmlFormatter is build on top of the expat parser, and therefore limited by expat. XmlFormatter is published under MIT license.
read more   
Last updated on March 9th, 2012

#XML documents #XML formatter #XML compressor #format #compress #XML #documents

0 User reviews so far.

SUBMIT