Apache Tika 1.4
A free and Open Source content analysis toolkit distributed by the Apache Foundation
Apache Tika supports the following document formats: HyperText Markup Language (HTTP), XML and derived formats, Microsoft Office document formats, OpenDocument Format (ODF), Portable Document Format (PDF), Electronic Publication Format (EPF), Rich Text Format (RTF), compression and packaging formats, text/audio/image/video formats, the mbox format, and Java class files and archives.
Previously, Apache Tika was a sub-project of the Apache Lucene software library. Now it is distributed as a standalone package by the Apache Software Foundation.
In a hurry? Add it to your Download Basket!
What's New in This Release:
- Removed a test HTML file with a poorly chosen GPL text in it (TIKA-1129).
- Improvements to tika-server to allow it to produce text/html and text/xml content (TIKA-1126, TIKA-1127).
- Improvements were made to the Compressor Parser to handle g'zipped files that require the decompressConcatenated option set to true (TIKA-1096).
- Addressed a typographic error that was preventing from detection of awk files (TIKA-1081).