Apache Tika 1.4

A free and Open Source content analysis toolkit distributed by the Apache Foundation
Apache Tika - Using Apache Tika in an Ant project
  2 Screenshots
Apache Tika is an open source toolkit designed to detect and extract metadata, as well as structured text content from several documents, using nothing but existing parser libraries.

Apache Tika supports the following document formats: HyperText Markup Language (HTTP), XML and derived formats, Microsoft Office document formats, OpenDocument Format (ODF), Portable Document Format (PDF), Electronic Publication Format (EPF), Rich Text Format (RTF), compression and packaging formats, text/audio/image/video formats, the mbox format, and Java class files and archives.

Previously, Apache Tika was a sub-project of the Apache Lucene software library. Now it is distributed as a standalone package by the Apache Software Foundation.

last updated on:
October 14th, 2013, 9:19 GMT
license type:
The Apache License 2.0 
developed by:
The Apache Software Foundation
ROOT \ Internet \ HTTP (WWW)
Apache Tika
Download Button

In a hurry? Add it to your Download Basket!

user rating



Rate it!
2 Screenshots
Apache Tika - Using Apache Tika as a command-line utility
What's New in This Release:
  • Removed a test HTML file with a poorly chosen GPL text in it (TIKA-1129).
  • Improvements to tika-server to allow it to produce text/html and text/xml content (TIKA-1126, TIKA-1127).
  • Improvements were made to the Compressor Parser to handle g'zipped files that require the decompressConcatenated option set to true (TIKA-1096).
  • Addressed a typographic error that was preventing from detection of awk files (TIKA-1081).
read full changelog

Add your review!