lxml Changelog

New in version 3.4.1

January 3rd, 2015
  • Features added:
  • New htmlfile HTML generator to accompany the incremental xmlfile serialisation API. Patch by Burak Arslan.
  • Bugs fixed:
  • lxml.sax.ElementTreeContentHandler did not initialise its superclass.

New in version 3.3.1 (February 12th, 2014)

  • Bugs fixed:
  • LP#1014290: HTML documents parsed with parser.feed() failed to find elements during tag iteration.
  • LP#1273709: Building in PyPy failed due to missing support for PyUnicode_Compare() and PyByteArray_*() in PyPy's C-API.
  • LP#1274413: Compilation in MSVC failed due to missing "stdint.h" standard header file.
  • LP#1274118: iterparse() failed to parse BOM prefixed files.

New in version 3.0 Alpha 2 (August 27th, 2012)

  • Features added:
  • The .iter() method of elements now accepts tag arguments like "{*}name" to search for elements with a given local name in any namespace. With this addition, all combinations of wildcards now work as expected: "{ns}name", "{}name", "{*}name", "{ns}*", "{}*" and "{*}*". Note that "name" is equivalent to "{}name", but "*" is "{*}*". The same change applies to the .getiterator(), .itersiblings(), .iterancestors(), .iterdescendants(), .iterchildren() and .itertext() methods;the strip_attributes(), strip_elements() and strip_tags() functions as well as the iterparse() class.
  • C14N allows specifying the inclusive prefixes to be promoted to top-level during exclusive serialisation.
  • Bugs fixed:
  • Passing long Unicode strings into the feed() parser interface failed to read the entire string.

New in version 2.3.5 (August 1st, 2012)

  • Crash when merging text nodes in element.remove().
  • Crash in sax/target parser when reporting empty doctype.

New in version 2.3.4 (March 27th, 2012)

  • Crash when building an nsmap (Element property) with empty namespace URIs.
  • Crash due to race condition when errors (or user messages) occur during threaded XSLT processing.
  • XSLT stylesheet compilation could ignore compilation errors.

New in version 2.3.2 (November 14th, 2011)

  • Features added:
  • lxml.objectify.deannotate() has a new boolean option cleanup_namespaces to remove the objectify namespace declarations (and generally clean up the namespace declarations) after removing the type annotations.
  • lxml.objectify gained its own SubElement() function as a copy of etree.SubElement to avoid an otherwise redundant import of lxml.etree on the user side.
  • Bugs fixed:
  • Fixed the "descendant" bug in cssselect a second time (after a first fix in lxml 2.3.1). The previous change resulted in a serious performance regression for the XPath based evaluation of the translated expression. Note that this breaks the usage of some of the generated XPath expressions as XSLT location paths that previously worked in 2.3.1.
  • Fixed parsing of some selectors in cssselect. Whitespace after combinators " >", "+" and "~" is now correctly ignored. Previously is was parsed as a descendant combinator. For example, "div > .foo" was parsed the same as "div >* .foo" instead of "div >.foo".

New in version 2.3.1 (September 26th, 2011)

  • Features added:
  • New option kill_tags in lxml.html.clean to remove specific tags and their content (i.e. their whole subtree).
  • pi.get() and pi.attrib on processing instructions to parse pseudo-attributes from the text content of processing instructions.
  • lxml.get_include() returns a list of include paths that can be used to compile external C code against lxml.etree. This is specifically required for statically linked lxml builds when code needs to compile against the exact same header file versions as lxml itself.
  • Resolver.resolve_file() takes an additional option close_file that configures if the file(-like) object will be closed after reading or not. By default, the file will be closed, as the user is not expected to keep a reference to it.
  • Bugs fixed:
  • HTML cleaning didn't remove 'data:' links.
  • The html5lib parser integration now uses the 'official' implementation in html5lib itself, which makes it work with newer releases of the library.
  • In lxml.sax, endElementNS() could incorrectly reject a plain tag name when the corresponding start event inferred the same plain tag name to be in the default namespace.
  • When an open file-like object is passed into parse() or iterparse(), the parser will no longer close it after use. This reverts a change in lxml 2.3 where all files would be closed. It is the users responsibility to properly close the file(-like) object, also in error cases.
  • Assertion error in lxml.html.cleaner when discarding top-level elements.
  • In lxml.cssselect, use the xpath 'A//B' (short for 'A/descendant-or-self::node()/B') instead of 'A/descendant::B' for the css descendant selector ('A B'). This makes a few edge cases to be consistent with the selector behavior in WebKit and Firefox, and makes more css expressions valid location paths (for use in xsl:template match).
  • In lxml.html, non-selected tags no longer show up in the collected form values.
  • Adding/removing values to/from a multiple select form field properly selects them and unselects them.
  • Other changes:
  • Static builds can specify the download directory with the --download-dir option.

New in version 2.3 (February 7th, 2011)

  • Features added:
  • When looking for children, lxml.objectify takes '{}tag' as meaning an empty namespace, as opposed to the parent namespace.
  • Bugs fixed:
  • When finished reading from a file-like object, the parser immediately calls its .close() method.
  • When finished parsing, iterparse() immediately closes the input file.
  • Work-around for libxml2 bug that can leave the HTML parser in a non-functional state after parsing a severly broken document (fixed in libxml2 2.7.8).
  • marque tag in HTML cleanup code is correctly named marquee.
  • Other changes:
  • Some public functions in the Cython-level C-API have more explicit return types.

New in version 2.2.8 / 2.3 Beta 1 (September 7th, 2010)

  • Crash in newer libxml2 versions when moving elements between documents that had attributes on replaced XInclude nodes.
  • XMLID() function was missing the optional parser and base_url parameters.
  • Searching for wildcard tags in iterparse() was broken in Py3.
  • lxml.html.open_in_browser() didn't work in Python 3 due to the use of os.tempnam. It now takes an optional 'encoding' parameter.