PDFTextStream Changelog

New in version 2.3.2

September 16th, 2011
  • This version includes a variety of fixes made to ensure PDFTextStream is capable of extracting text from PDF documents that are nonconforming to the PDF specification.
  • It also includes a variety of performance enhancements.

New in version 2.3.0 (April 24th, 2009)

  • Added an .isStruckThrough() method to com.snowtide.pdf.TextUnit, indicating whether a character has a strikethrough drawn through it.
  • Improved PDFTextStream's support for embedded character mappings.
  • The calculation of whitespace between words has been fixed to properly account for whitespace that is explicitly encoded in the source PDF documents.
  • Improved PDFTextStream's handling of composite content encodings, which previously could fail resulting in some ranges of PDF content being 'ignored' during extraction.
  • Fixed a bug in VisualOutputTarget where text from a single line would be split over multiple lines
  • Improved vertical alignment of text extracted using VisualOutputTarget
  • Improved VisualOutputTarget-produced extracts to eliminate spurious additional whitespace between closely-adjacent words

New in version 2.2.5 (December 31st, 2008)

  • This release adds support for extracting XFA forms data as XML.
  • It significantly improves the performance of text extraction using VisualOutputTarget. Support for PDF documents larger than 2GB.
  • A fix for a bug where the encodings from embedded Type1 fonts were previously not being applied properly in some circumstances.
  • A fix for a problem where newer content in updated PDF documents was sometimes being ignored.
  • A fix for a problem where PDFDocEncoding-encoded bookmarks and metadata were not being decoded properly.
  • A .getDestinationName() method in com.snowtide.pdf.Bookmark.