Docco is a little personal document management system we build on top of Apache's indexing and search engine Lucene.
The tool is able to index local hard drives and everything mounted into the local file system, such as Windows or Unix network drives. It scans for a number of different document formats and creates a database containing which words are contained in which documents.
This allows very fast lookup of keywords and other information like authors, title or location. The keywords used are generated from the bodies of the documents, such that no manual annotation is required.
Docco support the follwing formats:
· plain text
· OpenOffice/ StarOffice 6.0 documents
· Word (with POI plugin)
· Excel (with POI plugin)
· PDF (with PDFbox or Multivalent plugin)
· UNIX man pages (with Multivalent plugin)
Once an index is created, the query interface allows asking for any documents containing certain keywords and shows how these combine. Once a set of interesting documents is found, they can be selected and will be displayed as tree view, from which they can be opened in the default application.
· Java 1.4.2 or later
What's New in This Release:
· symlinks are not followed anymore (Linux/UNIX)
· index locks are detected and can be removed by the user
· extra information for index (contents, mappings) is stored after the index was created, not only on shutdown. This means Docco can access the index even after an unclean exit (it will be locked, though)
· support for the RTF format (some of it)
· nested diagrams can be created using a new button
· Lucene is updated to version 1.9.1, all code has been updated to not have any deprecation warnings
· analyzers are now supported, which most importantly means we support stop words and stemming for a number of languages, with the choice of analyzer being attached to each index -thus Docco can query different directories with different language tools