Apache Tika 1.4

A free and Open Source content analysis toolkit distributed by the Apache Foundation

  Add it to your Download Basket!

 Add it to your Watch List!

0/5

Rate it!

What's new in Apache Tika 1.4:

  • Removed a test HTML file with a poorly chosen GPL text in it (TIKA-1129).
  • Improvements to tika-server to allow it to produce text/html and text/xml content (TIKA-1126, TIKA-1127).
  • Improvements were made to the Compressor Parser to handle g'zipped files that require the decompressConcatenated option set to true (TIKA-1096).
  • Addressed a typographic error that was preventing from detection of awk files (TIKA-1081).
Read full changelog
send us
an update
LICENSE TYPE:
The Apache License 2.0 
USER RATING:
UNRATED
  0.0/5
DEVELOPED BY:
The Apache Software Foundation
HOMEPAGE:
tika.apache.org
CATEGORY:
ROOT \ Internet \ HTTP (WWW)
2 Apache Tika Screenshots:
Apache Tika - Using Apache Tika in an Ant projectApache Tika - Using Apache Tika as a command-line utility
Apache Tika is an open source toolkit designed to detect and extract metadata, as well as structured text content from several documents, using nothing but existing parser libraries.

Apache Tika supports the following document formats: HyperText Markup Language (HTTP), XML and derived formats, Microsoft Office document formats, OpenDocument Format (ODF), Portable Document Format (PDF), Electronic Publication Format (EPF), Rich Text Format (RTF), compression and packaging formats, text/audio/image/video formats, the mbox format, and Java class files and archives.

Previously, Apache Tika was a sub-project of the Apache Lucene software library. Now it is distributed as a standalone package by the Apache Software Foundation.

Last updated on October 14th, 2013

requirements

#content analysis #extract metadata #metadata detector #content #analysis #metadata #extractor

Add your review!

SUBMIT