MuPDF Changelog

What's new in MuPDF 1.24.0 RC 1

Mar 13, 2024
  • Error handling changes:
  • You must call pdf_report_error in the final fz_catch. Any unreported errors will be automatically reported when a new error is raised, or when closing the fitz context.
  • New formats:
  • Read Office (XML) files! We internally open and convert docx/pptx/xlsx documents to HTML to allow reading the plain text content. The exact layout will NOT be preserved.
  • Optional compile time option to use libarchive for reading CBR and other archive formats.
  • Read plain text documents.
  • Read gzipped files directly.
  • Open and read FDF files to support importing annotations or form data using the low-level PDF functions. There are no tools for this yet.
  • Read CFB (Compound File Binary) format archives -- used for the Office formats.
  • Write images as JPEG2000.
  • New tools and features:
  • mutool bake (and associated functions) to bake appearance of annotations and forms into static content.
  • Font subsetting flag to mutool clean (EXPERIMENTAL FEATURE).
  • Option to use ObjStms when writing PDF files.
  • Compression effort option when writing PDF files.
  • Add option to control how line art is affected by redaction. Add more options to control how images are affected by redaction (remove-unless-invisible).
  • Fix up q/Q gstate balance when cleaning content streams.
  • New functions and types:
  • pdf_rearrange_pages to subset or re-order pages in a PDF file.
  • fz_invert_bitmap to invert monochrome bitmaps.
  • fz_compressed_image_type to query the format of a compressed image.
  • fz_text_decoder to convert various legacy and CJK encodings into UTF-8.
  • More helper functions to easily manipulate PDF objects in C.
  • Add flag to control fz_place_story overflow behavior when the text doesn't fit into the box.
  • New archive handlers can be added at runtime.
  • Major bug fixes and improvements:
  • Support using Art, Bleed, Media, and Trim boxes for PDF page size.
  • Support ActualText in PDF! No more strange text extraction when the file uses ActualText to patch over bad font encodings.
  • Add special TrueType fallback encoding CMap for a specific flavor of broken PDF files that use an "identity" encoding without embedding the font.
  • Limited "transfer function" suppError handling changes:
  • You must call pdf_report_error in the final fz_catch. Any unreported errors will be automatically reported when a new error is raised, or when closing the fitz context.
  • New formats:
  • Read Office (XML) files! We internally open and convert docx/pptx/xlsx documents to HTML to allow reading the plain text content. The exact layout will NOT be preserved.
  • Optional compile time option to use libarchive for reading CBR and other archive formats.
  • Read plain text documents.
  • Read gzipped files directly.
  • Open and read FDF files to support importing annotations or form data using the low-level PDF functions. There are no tools for this yet.
  • Read CFB (Compound File Binary) format archives -- used for the Office formats.
  • Write images as JPEG2000.
  • New tools and features:
  • mutool bake (and associated functions) to bake appearance of annotations and forms into static content.
  • Font subsetting flag to mutool clean (EXPERIMENTAL FEATURE).
  • Option to use ObjStms when writing PDF files.
  • Compression effort option when writing PDF files.
  • Add option to control how line art is affected by redaction. Add more options to control how images are affected by redaction (remove-unless-invisible).
  • Fix up q/Q gstate balance when cleaning content streams.
  • New functions and types:
  • pdf_rearrange_pages to subset or re-order pages in a PDF file.
  • fz_invert_bitmap to invert monochrome bitmaps.
  • fz_compressed_image_type to query the format of a compressed image.
  • fz_text_decoder to convert various legacy and CJK encodings into UTF-8.
  • More helper functions to easily manipulate PDF objects in C.
  • Add flag to control fz_place_story overflow behavior when the text doesn't fit into the box.
  • New archive handlers can be added at runtime.
  • Major bug fixes and improvements:
  • Support using Art, Bleed, Media, and Trim boxes for PDF page size.
  • Support ActualText in PDF! No more strange text extraction when the file uses ActualText to patch over bad font encodings.
  • Add special TrueType fallback encoding CMap for a specific flavor of broken PDF files that use an "identity" encoding without embedding the font.
  • Limited "transfer function" support in PDF. Transfer functions are a deprecated legacy PDF feature that predates proper color management. They were intended to provide limited color management such as applying a gamma curve. Transfer functions have often been (ab)-used to invert images, and many PDF creators use them when writing softmask images. We have added support for this case only.
  • Box drawing characters added to fonts for HTML and plain text documents.
  • Write more compact PDF files (removed some unnecessary whitespace).
  • Improved selection behavior for non-axis aligned text.
  • Improved heuristics for detecting the logical and visual order of RTL text in PDF.
  • Improved heuristics for inserting missing spaces in PDF text.
  • Improved handling of CMYK JPEG files (which ones are inverted and which are not).
  • Improved content type detection. Don't assume everything is PDF when we can't recognize it.
  • Removed deprecated functions:
  • pdf_check_signature
  • ort in PDF. Transfer functions are a deprecated legacy PDF feature that predates proper color management. They were intended to provide limited color management such as applying a gamma curve. Transfer functions have often been (ab)-used to invert images, and many PDF creators use them when writing softmask images. We have added support for this case only.
  • Box drawing characters added to fonts for HTML and plain text documents.
  • Write more compact PDF files (removed some unnecessary whitespace).
  • Improved selection behavior for non-axis aligned text.
  • Improved heuristics for detecting the logical and visual order of RTL text in PDF.
  • Improved heuristics for inserting missing spaces in PDF text.
  • Improved handling of CMYK JPEG files (which ones are inverted and which are not).
  • Improved content type detection. Don't assume everything is PDF when we can't recognize it.
  • Removed deprecated functions:
  • pdf_check_signature

New in MuPDF 1.23.8 (Jan 8, 2024)

  • Move previously private APIs into public headers so they can be used in python bindings.
  • Add version numbers to shared library installation targets on Linux/OpenBSD.
  • Avoid setuptools problems for python bindings in python 3.12.
  • Fix makefile so python bindings build with tesseract.

New in MuPDF 1.23.7 (Jan 8, 2024)

  • Fix rendering issue concerning group alpha.
  • Fix unexpected HTML table rectangles on subsequent pages.
  • Fix text extraction of control characters from PDF.
  • Fix bug concerning Stories having page-break-after set.
  • Ignore broken structure trees instead of reporting an error.
  • Various fixes for pymupdf.

New in MuPDF 1.23.6 (Jan 8, 2024)

  • Add new text file document handler.
  • Add interface for rearranging pages.
  • Fix double free bug in html parser.

New in MuPDF 1.23.5 (Jan 8, 2024)

  • Use CropBox as origin for fitz space in PDF documents so that page bounding box origin is at the top left.
  • Fix parsing of cmap with surrogate characters.
  • Fix bug in story handling resetting.
  • Various smaller fixes for pymupdf.

New in MuPDF 1.23.4 (Oct 11, 2023)

  • Fix bug causing a crash when cleaning up Android draw device upon destroy.
  • Fix bug where bitmaps were reused after being recycled in Android.
  • Add fixed padding to ink annotation to avoid unselectable bboxes for tiny strokes.
  • Add API for checking if an annotation has a Rect property.
  • Fix bug where cycles in structure trees caused eternal loops.
  • Fix bug where colorspaces were not retained for in-linee images during filtering.
  • Change default to use CropBox rather than MediaBox.

New in MuPDF 1.22.1 (Aug 16, 2023)

  • Garbage collection problem causing file bloat on clean
  • Don't assume sorted objects in pdf_objcmp
  • Don't layout empty documents
  • Type 3 font char bboxes