Apache Tika

1.4 The Apache License 2.0    
  UNRATED

  65 downloads

A free and Open Source content analysis toolkit distributed by the Apache Foundation

description

download

specs

changelog

Apache Tika is an open source toolkit designed to detect and extract metadata, as well as structured text content from several documents, using nothing but existing parser libraries.

Apache Tika supports the following document formats: HyperText Markup Language (HTTP), XML and derived formats, Microsoft Office document formats, OpenDocument Format (ODF), Portable Document Format (PDF), Electronic Publication Format (EPF), Rich Text Format (RTF), compression and packaging formats, text/audio/image/video formats, the mbox format, and Java class files and archives.

Previously, Apache Tika was a sub-project of the Apache Lucene software library. Now it is distributed as a standalone package by the Apache Software Foundation.
read more   
Last updated on October 14th, 2013

#content analysis #extract metadata #metadata detector #content #analysis #metadata #extractor

Apache Tika - Using Apache Tika in an Ant projectApache Tika - Using Apache Tika as a command-line utility

0 User reviews so far.

SUBMIT