What's new in MuPDF 1.24.0 RC 1
Mar 13, 2024
- Error handling changes:
- You must call pdf_report_error in the final fz_catch. Any unreported errors will be automatically reported when a new error is raised, or when closing the fitz context.
- New formats:
- Read Office (XML) files! We internally open and convert docx/pptx/xlsx documents to HTML to allow reading the plain text content. The exact layout will NOT be preserved.
- Optional compile time option to use libarchive for reading CBR and other archive formats.
- Read plain text documents.
- Read gzipped files directly.
- Open and read FDF files to support importing annotations or form data using the low-level PDF functions. There are no tools for this yet.
- Read CFB (Compound File Binary) format archives -- used for the Office formats.
- Write images as JPEG2000.
- New tools and features:
- mutool bake (and associated functions) to bake appearance of annotations and forms into static content.
- Font subsetting flag to mutool clean (EXPERIMENTAL FEATURE).
- Option to use ObjStms when writing PDF files.
- Compression effort option when writing PDF files.
- Add option to control how line art is affected by redaction. Add more options to control how images are affected by redaction (remove-unless-invisible).
- Fix up q/Q gstate balance when cleaning content streams.
- New functions and types:
- pdf_rearrange_pages to subset or re-order pages in a PDF file.
- fz_invert_bitmap to invert monochrome bitmaps.
- fz_compressed_image_type to query the format of a compressed image.
- fz_text_decoder to convert various legacy and CJK encodings into UTF-8.
- More helper functions to easily manipulate PDF objects in C.
- Add flag to control fz_place_story overflow behavior when the text doesn't fit into the box.
- New archive handlers can be added at runtime.
- Major bug fixes and improvements:
- Support using Art, Bleed, Media, and Trim boxes for PDF page size.
- Support ActualText in PDF! No more strange text extraction when the file uses ActualText to patch over bad font encodings.
- Add special TrueType fallback encoding CMap for a specific flavor of broken PDF files that use an "identity" encoding without embedding the font.
- Limited "transfer function" suppError handling changes:
- You must call pdf_report_error in the final fz_catch. Any unreported errors will be automatically reported when a new error is raised, or when closing the fitz context.
- New formats:
- Read Office (XML) files! We internally open and convert docx/pptx/xlsx documents to HTML to allow reading the plain text content. The exact layout will NOT be preserved.
- Optional compile time option to use libarchive for reading CBR and other archive formats.
- Read plain text documents.
- Read gzipped files directly.
- Open and read FDF files to support importing annotations or form data using the low-level PDF functions. There are no tools for this yet.
- Read CFB (Compound File Binary) format archives -- used for the Office formats.
- Write images as JPEG2000.
- New tools and features:
- mutool bake (and associated functions) to bake appearance of annotations and forms into static content.
- Font subsetting flag to mutool clean (EXPERIMENTAL FEATURE).
- Option to use ObjStms when writing PDF files.
- Compression effort option when writing PDF files.
- Add option to control how line art is affected by redaction. Add more options to control how images are affected by redaction (remove-unless-invisible).
- Fix up q/Q gstate balance when cleaning content streams.
- New functions and types:
- pdf_rearrange_pages to subset or re-order pages in a PDF file.
- fz_invert_bitmap to invert monochrome bitmaps.
- fz_compressed_image_type to query the format of a compressed image.
- fz_text_decoder to convert various legacy and CJK encodings into UTF-8.
- More helper functions to easily manipulate PDF objects in C.
- Add flag to control fz_place_story overflow behavior when the text doesn't fit into the box.
- New archive handlers can be added at runtime.
- Major bug fixes and improvements:
- Support using Art, Bleed, Media, and Trim boxes for PDF page size.
- Support ActualText in PDF! No more strange text extraction when the file uses ActualText to patch over bad font encodings.
- Add special TrueType fallback encoding CMap for a specific flavor of broken PDF files that use an "identity" encoding without embedding the font.
- Limited "transfer function" support in PDF. Transfer functions are a deprecated legacy PDF feature that predates proper color management. They were intended to provide limited color management such as applying a gamma curve. Transfer functions have often been (ab)-used to invert images, and many PDF creators use them when writing softmask images. We have added support for this case only.
- Box drawing characters added to fonts for HTML and plain text documents.
- Write more compact PDF files (removed some unnecessary whitespace).
- Improved selection behavior for non-axis aligned text.
- Improved heuristics for detecting the logical and visual order of RTL text in PDF.
- Improved heuristics for inserting missing spaces in PDF text.
- Improved handling of CMYK JPEG files (which ones are inverted and which are not).
- Improved content type detection. Don't assume everything is PDF when we can't recognize it.
- Removed deprecated functions:
- pdf_check_signature
- ort in PDF. Transfer functions are a deprecated legacy PDF feature that predates proper color management. They were intended to provide limited color management such as applying a gamma curve. Transfer functions have often been (ab)-used to invert images, and many PDF creators use them when writing softmask images. We have added support for this case only.
- Box drawing characters added to fonts for HTML and plain text documents.
- Write more compact PDF files (removed some unnecessary whitespace).
- Improved selection behavior for non-axis aligned text.
- Improved heuristics for detecting the logical and visual order of RTL text in PDF.
- Improved heuristics for inserting missing spaces in PDF text.
- Improved handling of CMYK JPEG files (which ones are inverted and which are not).
- Improved content type detection. Don't assume everything is PDF when we can't recognize it.
- Removed deprecated functions:
- pdf_check_signature
New in MuPDF 1.23.8 (Jan 8, 2024)
- Move previously private APIs into public headers so they can be used in python bindings.
- Add version numbers to shared library installation targets on Linux/OpenBSD.
- Avoid setuptools problems for python bindings in python 3.12.
- Fix makefile so python bindings build with tesseract.
New in MuPDF 1.23.7 (Jan 8, 2024)
- Fix rendering issue concerning group alpha.
- Fix unexpected HTML table rectangles on subsequent pages.
- Fix text extraction of control characters from PDF.
- Fix bug concerning Stories having page-break-after set.
- Ignore broken structure trees instead of reporting an error.
- Various fixes for pymupdf.
New in MuPDF 1.23.6 (Jan 8, 2024)
- Add new text file document handler.
- Add interface for rearranging pages.
- Fix double free bug in html parser.
New in MuPDF 1.23.5 (Jan 8, 2024)
- Use CropBox as origin for fitz space in PDF documents so that page bounding box origin is at the top left.
- Fix parsing of cmap with surrogate characters.
- Fix bug in story handling resetting.
- Various smaller fixes for pymupdf.
New in MuPDF 1.23.4 (Oct 11, 2023)
- Fix bug causing a crash when cleaning up Android draw device upon destroy.
- Fix bug where bitmaps were reused after being recycled in Android.
- Add fixed padding to ink annotation to avoid unselectable bboxes for tiny strokes.
- Add API for checking if an annotation has a Rect property.
- Fix bug where cycles in structure trees caused eternal loops.
- Fix bug where colorspaces were not retained for in-linee images during filtering.
- Change default to use CropBox rather than MediaBox.
New in MuPDF 1.22.1 (Aug 16, 2023)
- Garbage collection problem causing file bloat on clean
- Don't assume sorted objects in pdf_objcmp
- Don't layout empty documents
- Type 3 font char bboxes