PdfParser is an open source, portable, free, platform-independent, web-based and standalone library software that comprises of several utilities for extracting data from PDF (Portable Document Format) files.
Features at a glance
It has been engineered in such a way that it can load and parse headers and objects, extracts metadata information (description, author or keywords), supports compressed PDF documents, as well as extracts text from ordered pages.
Additionally, the software supports various charset encodings (Mac OS Roman and Windows Ansi), it is fully compliant with the PSR-0 and PSR-1 standards, and compatible with Composer. Also, it can handle octal and hexa content encodings in text sections.
Unfortunately, at the moment, the PdfParser library software does not support secured PDF documents. Therefore, if you want to parse PDF files that have been encrypted or password protected, you won’t be able to do that with this software.
Getting started with PdfParser
The PdfParser software has been designed to run on top of a web server. It’s a PHP library, so it will require PHP 5.3 or later installed on your server. It is derived from the TCPDF parser library software.
Installing PdfParser is quite an easy task, as the package can be automatically downloaded on your server by using the Composer command-line program. Therefore, you will have to first add the software to your composer.json file and then use the ‘composer update smalot/pdfparser’ command to download it.
As mentioned, PdfParser can also be deployed as a standalone library. For this, you must grab its source code from GitHub, then run the ‘composer update’ command to automatically download any dependencies and generate the autoload.php file. More details can be found on the project’s website (see link below).
Under the hood and supported operating systems
Looking under the hood of PdfParser, we can notice that the program is written in the PHP server-side programming language. This means that it can be used on any operating system, supporting both 32-bit and 64-bit instruction set architectures.