Harvest is a system to collect information and make them searchable using a web interface. Harvest can collect information on inter- and intranet using http, ftp, nntp as well as local files like data on harddisk, CDROM and file servers.
Current list of supported formats in addition to HTML include TeX, DVI, PS, full text, mail, man pages, news, troff, WordPerfect, RTF, Microsoft Word/Excel, SGML, C sources and many more. Stubs for PDF support is included in Harvest and will use Xpdf or Acroread to process PDF files. Adding support for new format is easy due to Harvest's modular design.
Harvest is a modular, distributed search system framework with a working set components to make it a complete search system.
Here are some key features of "harvest":
· Harvest is designed to work as distributed system. It can distribute the load among different machines. It is possible to use a number of machines to gather data. The fulltext indexer doesn't have to run on the same machine as broker or web server.
· Harvest is designed to be modular. Every single step during collecting data, and answering search requests are implemented as single programs. This makes it easy to modify or replace parts of Harvest to customize its behaviour.
· Harvest allows complete control over the content of data in the search database. It is possible to customize the summarizer to create desired summaries which will be used for searching. The filtering mechanism of Harvest allows to make modifications to the summary created by summarizers. Manually created summaries can be inserted to the search database.
· The Search interface is written in Perl to make customization easy, if desired.
What's New in This Release:
· src/common/qdbm: updated to qdbm-1.8.20.
· components/broker/zebra/yaz: merged yaz-2.0.30.