New in version 2.3.2
September 16th, 2011
- This version includes a variety of fixes made to ensure PDFTextStream is capable of extracting text from PDF documents that are nonconforming to the PDF specification.
- It also includes a variety of performance enhancements.
New in version 2.3.0 (April 24th, 2009)
- Added an .isStruckThrough() method to com.snowtide.pdf.TextUnit, indicating whether a character has a strikethrough drawn through it.
- Improved PDFTextStream's support for embedded character mappings.
- The calculation of whitespace between words has been fixed to properly account for whitespace that is explicitly encoded in the source PDF documents.
- Improved PDFTextStream's handling of composite content encodings, which previously could fail resulting in some ranges of PDF content being 'ignored' during extraction.
- Fixed a bug in VisualOutputTarget where text from a single line would be split over multiple lines
- Improved vertical alignment of text extracted using VisualOutputTarget
- Improved VisualOutputTarget-produced extracts to eliminate spurious additional whitespace between closely-adjacent words
New in version 2.2.5 (December 31st, 2008)
- This release adds support for extracting XFA forms data as XML.
- It significantly improves the performance of text extraction using VisualOutputTarget. Support for PDF documents larger than 2GB.
- A fix for a bug where the encodings from embedded Type1 fonts were previously not being applied properly in some circumstances.
- A fix for a problem where newer content in updated PDF documents was sometimes being ignored.
- A fix for a problem where PDFDocEncoding-encoded bookmarks and metadata were not being decoded properly.
- A .getDestinationName() method in com.snowtide.pdf.Bookmark.