lxml Changelog

What's new in lxml 3.4.1

Jan 3, 2015
  • Features added:
  • New htmlfile HTML generator to accompany the incremental xmlfile serialisation API. Patch by Burak Arslan.
  • Bugs fixed:
  • lxml.sax.ElementTreeContentHandler did not initialise its superclass.

New in lxml 3.3.1 (Feb 12, 2014)

  • Bugs fixed:
  • LP#1014290: HTML documents parsed with parser.feed() failed to find elements during tag iteration.
  • LP#1273709: Building in PyPy failed due to missing support for PyUnicode_Compare() and PyByteArray_*() in PyPy's C-API.
  • LP#1274413: Compilation in MSVC failed due to missing "stdint.h" standard header file.
  • LP#1274118: iterparse() failed to parse BOM prefixed files.

New in lxml 3.0 Alpha 2 (Aug 27, 2012)

  • Features added:
  • The .iter() method of elements now accepts tag arguments like "{*}name" to search for elements with a given local name in any namespace. With this addition, all combinations of wildcards now work as expected: "{ns}name", "{}name", "{*}name", "{ns}*", "{}*" and "{*}*". Note that "name" is equivalent to "{}name", but "*" is "{*}*". The same change applies to the .getiterator(), .itersiblings(), .iterancestors(), .iterdescendants(), .iterchildren() and .itertext() methods;the strip_attributes(), strip_elements() and strip_tags() functions as well as the iterparse() class.
  • C14N allows specifying the inclusive prefixes to be promoted to top-level during exclusive serialisation.
  • Bugs fixed:
  • Passing long Unicode strings into the feed() parser interface failed to read the entire string.

New in lxml 2.3.5 (Aug 1, 2012)

  • Crash when merging text nodes in element.remove().
  • Crash in sax/target parser when reporting empty doctype.

New in lxml 2.3.4 (Mar 27, 2012)

  • Crash when building an nsmap (Element property) with empty namespace URIs.
  • Crash due to race condition when errors (or user messages) occur during threaded XSLT processing.
  • XSLT stylesheet compilation could ignore compilation errors.

New in lxml 2.3.2 (Nov 14, 2011)

  • Features added:
  • lxml.objectify.deannotate() has a new boolean option cleanup_namespaces to remove the objectify namespace declarations (and generally clean up the namespace declarations) after removing the type annotations.
  • lxml.objectify gained its own SubElement() function as a copy of etree.SubElement to avoid an otherwise redundant import of lxml.etree on the user side.
  • Bugs fixed:
  • Fixed the "descendant" bug in cssselect a second time (after a first fix in lxml 2.3.1). The previous change resulted in a serious performance regression for the XPath based evaluation of the translated expression. Note that this breaks the usage of some of the generated XPath expressions as XSLT location paths that previously worked in 2.3.1.
  • Fixed parsing of some selectors in cssselect. Whitespace after combinators " >", "+" and "~" is now correctly ignored. Previously is was parsed as a descendant combinator. For example, "div > .foo" was parsed the same as "div >* .foo" instead of "div >.foo".

New in lxml 2.3.1 (Sep 26, 2011)

  • Features added:
  • New option kill_tags in lxml.html.clean to remove specific tags and their content (i.e. their whole subtree).
  • pi.get() and pi.attrib on processing instructions to parse pseudo-attributes from the text content of processing instructions.
  • lxml.get_include() returns a list of include paths that can be used to compile external C code against lxml.etree. This is specifically required for statically linked lxml builds when code needs to compile against the exact same header file versions as lxml itself.
  • Resolver.resolve_file() takes an additional option close_file that configures if the file(-like) object will be closed after reading or not. By default, the file will be closed, as the user is not expected to keep a reference to it.
  • Bugs fixed:
  • HTML cleaning didn't remove 'data:' links.
  • The html5lib parser integration now uses the 'official' implementation in html5lib itself, which makes it work with newer releases of the library.
  • In lxml.sax, endElementNS() could incorrectly reject a plain tag name when the corresponding start event inferred the same plain tag name to be in the default namespace.
  • When an open file-like object is passed into parse() or iterparse(), the parser will no longer close it after use. This reverts a change in lxml 2.3 where all files would be closed. It is the users responsibility to properly close the file(-like) object, also in error cases.
  • Assertion error in lxml.html.cleaner when discarding top-level elements.
  • In lxml.cssselect, use the xpath 'A//B' (short for 'A/descendant-or-self::node()/B') instead of 'A/descendant::B' for the css descendant selector ('A B'). This makes a few edge cases to be consistent with the selector behavior in WebKit and Firefox, and makes more css expressions valid location paths (for use in xsl:template match).
  • In lxml.html, non-selected tags no longer show up in the collected form values.
  • Adding/removing values to/from a multiple select form field properly selects them and unselects them.
  • Other changes:
  • Static builds can specify the download directory with the --download-dir option.

New in lxml 2.3 (Feb 7, 2011)

  • Features added:
  • When looking for children, lxml.objectify takes '{}tag' as meaning an empty namespace, as opposed to the parent namespace.
  • Bugs fixed:
  • When finished reading from a file-like object, the parser immediately calls its .close() method.
  • When finished parsing, iterparse() immediately closes the input file.
  • Work-around for libxml2 bug that can leave the HTML parser in a non-functional state after parsing a severly broken document (fixed in libxml2 2.7.8).
  • marque tag in HTML cleanup code is correctly named marquee.
  • Other changes:
  • Some public functions in the Cython-level C-API have more explicit return types.

New in lxml 2.2.8 / 2.3 Beta 1 (Sep 7, 2010)

  • Crash in newer libxml2 versions when moving elements between documents that had attributes on replaced XInclude nodes.
  • XMLID() function was missing the optional parser and base_url parameters.
  • Searching for wildcard tags in iterparse() was broken in Py3.
  • lxml.html.open_in_browser() didn't work in Python 3 due to the use of os.tempnam. It now takes an optional 'encoding' parameter.

New in lxml 2.2.8 (Sep 2, 2010)

  • Crash in newer libxml2 versions when moving elements between documents that had attributes on replaced XInclude nodes.
  • Import fix for urljoin in Python 3.1+.

New in lxml 2.2.7 (Jul 25, 2010)

  • Bugs fixed:
  • Crash in XSLT when generating text-only result documents with a stylesheet created in a different thread.

New in lxml 2.2.6 (Mar 2, 2010)

  • Fixed several Python 3 regressions by building with Cython 0.11.3.

New in lxml 2.2.5 (Feb 28, 2010)

  • Features added:
  • Support for running XSLT extension elements on the input root node (e.g. in a template matching on "/").
  • Bugs fixed:
  • Crash in XPath evaluation when reading smart strings from a document other than the original context document.
  • Support recent versions of html5lib by not requiring its XHTMLParser in htmlparser.py anymore.
  • Manually instantiating the custom element classes in lxml.objectify could crash.
  • Invalid XML text characters were not rejected by the API when they appeared in unicode strings directly after non-ASCII characters.
  • lxml.html.open_http_urllib() did not work in Python 3.
  • The functions strip_tags() and strip_elements() in lxml.etree did not remove all occurrences of a tag in all cases.
  • Crash in XSLT extension elements when the XSLT context node is not an element.

New in lxml 2.2.2 (Jun 22, 2009)

  • Features added:
  • New helper functions strip_attributes(), strip_elements(), strip_tags() in lxml.etree to remove attributes/subtrees/tags from a subtree.
  • Bugs fixed:
  • Namespace cleanup on subtree insertions could result in missing namespace declarations (and potentially crashes) if the element defining a namespace was deleted and the namespace was not used by the top element of the inserted subtree but only in deeper subtrees.
  • Raising an exception from a parser target callback didn't always terminate the parser.
  • Only {true, false, 1, 0} are accepted as the lexical representation for BoolElement ({True, False, T, F, t, f} not any more), restoring lxml

New in lxml 2.2.1 (Jun 3, 2009)

  • Features added:
  • Injecting default attributes into a document during XML Schema validation (also at parse time).
  • Pass huge_tree parser option to disable parser security restrictions imposed by libxml2 2.7.
  • Bugs fixed:
  • The script for statically building libxml2 and libxslt didn't work in Py3.
  • XMLSchema() also passes invalid schema documents on to libxml2 for parsing (which could lead to a crash before release 2.6.24).

New in lxml 2.2 (Mar 21, 2009)

  • Features added:
  • Support for standalone flag in XML declaration through tree.docinfo.standalone and by passing standalone=True/False on serialisation.
  • Bugs fixed:
  • Crash when parsing an XML Schema with external imports from a filename.

New in lxml 2.2 Beta 4 (Feb 27, 2009)

  • Features added:
  • Support strings and instantiable Element classes as child arguments to the constructor of custom Element classes.
  • GZip compression support for serialisation to files and file-like objects.
  • Bugs fixed:
  • Deep-copying an ElementTree copied neither its sibling PIs and comments nor its internal/external DTD subsets.
  • Soupparser failed on broken attributes without values.
  • Crash in XSLT when overwriting an already defined attribute using xsl:attribute.
  • Crash bug in exception handling code under Python 3. This was due to a problem in Cython, not lxml itself.
  • lxml.html.FormElement._name() failed for non top-level forms.
  • TAG special attribute in constructor of custom Element classes was evaluated incorrectly.
  • Other changes:
  • Official support for Python 3.0.1.
  • Element.findtext() now returns an empty string instead of None for Elements without text content.

New in lxml 2.2 Beta 3 (Feb 18, 2009)

  • Features added:
  • XSLT.strparam() class method to wrap quoted string parameters that require escaping.
  • Bugs fixed:
  • Memory leak in XPath evaluators.
  • Crash when parsing indented XML in one thread and merging it with other documents parsed in another thread.
  • Setting the base attribute in lxml.objectify from a unicode string failed.
  • Fixes following changes in Python 3.0.1.
  • Minor fixes for Python 3.
  • Other changes:
  • The global error log (which is copied into the exception log) is now local to a thread, which fixes some race conditions.
  • More robust error handling on serialisation.

New in lxml 2.2 Beta 2 (Jan 26, 2009)

  • Bugs fixed:
  • Potential memory leak on exception handling. This was due to a problem in Cython, not lxml itself.
  • iter_links (and related link-rewriting functions) in lxml.html would interpret CSS like url("link") incorrectly (treating the quotation marks as part of the link).
  • Failing import on systems that have an io module.

New in lxml 2.1.5 (Jan 6, 2009)

  • Bugs fixed:
  • Potential memory leak on exception handling. This was due to a problem in Cython, not lxml itself.
  • Failing import on systems that have an io module.

New in lxml 2.2 Alpha 1 (Nov 24, 2008)

  • Features added:
  • Support for XSLT result tree fragments in XPath/XSLT extension functions.
  • QName objects have new properties namespace and localname.
  • New options for exclusive C14N and C14N without comments.
  • Instantiating a custom Element classes creates a new Element.
  • Bugs fixed:
  • XSLT didn't inherit the parse options of the input document.
  • 0-bytes could slip through the API when used inside of Unicode strings.
  • With lxml.html.clean.autolink, links with balanced parenthesis, that end in a parenthesis, will be linked in their entirety (typical with Wikipedia links).

New in lxml 2.1.3 (Nov 18, 2008)

  • Bugs fixed:
  • Ref-count leaks when lxml enters a try-except statement while an outside exception lives in sys.exc_*(). This was due to a problem in Cython, not lxml itself.
  • Parser Unicode decoding errors could get swallowed by other exceptions.
  • Name/import errors in some Python modules.
  • Internal DTD subsets that did not specify a system or public ID were not serialised and did not appear in the docinfo property of ElementTrees.
  • Fix a pre-Py3k warning when parsing from a gzip file in Py2.6.
  • Test suite fixes for libxml2 2.7.
  • Resolver.resolve_string() did not work for non-ASCII byte strings.
  • Resolver.resolve_file() was broken.
  • Overriding the parser encoding didn't work for many encodings.