libextractor is a library that can be used to easily extract meta-data from different files.
Among the supported formats we can mention: HTML, PS, PDF, OLE2 (DOC, XLS, PPT), StarOffice (sdw), OpenOffice (sxw), DVI, MP3 (ID3v1 and ID3v2), MAN, OGG, JPEG, WAV, GIF, TIFF, PNG, DEB, TAR(.GZ), RPM, ZIP, REAL, ELF, RIFF (AVI), QT, MPEG and ASF.
Product's homepage
What's New in This Release: [ read full changelog ]
· Major changes to the plugin mechanism now allow out-of-process plugins full random access to the entire file.
· Most plugins have been rewritten to the new plugin API.
· The external (libextractor) API remains unchanged and compatible with 0.6.
· As part of the rewrite, many plugins were changed to use standard 3rd party libraries (libjpeg, libtiff, libgif, libtidy, and libmagic) for parsing.
· A new plugin based on gstreamer replaces many existing multimedia plugins.
· Automated test cases for (almost all) of the plugins were also written, and the documentation was updated.