Sherlock Holmes 4.0

A universal search engine.
Sherlock Holmes is a universal search engine, a system for gathering and indexing of textual data (text files, web pages, etc), both locally and over the network.

Main features:

  • Gathers files via HTTP or from local files.
  • Parses text files, HTML, PDF, and several other formats using external parsers (such as MS Word and PostScript).
  • The whole system is modular, so adding your own data sources or parsers is just matter of plugging in right module (well, usually also writing it).
  • Works well in mixed charset environment.
  • Considers multiple occurences of the same file (even with minor changes) a single document with multiple URL's.
  • Everything is highly configurable. You can write filtering rules in a special language which allows to tweak configuration variables depending on the document being processed.
  • Searching of words, phrases, and boolean expressions. Searching in filenames and link texts.
  • Proximity search and proximity weighting of regular searches.
  • Recognition of languages, easy integration of stemmers and synonymic dictionaries.
  • Spelling checker based on word frequencies observed in the indexed data, hinting the user that his query might be misspelled.
  • Search results include context in each document.
  • Scales well to tens of millions of documents on normal PC hardware.
  • User interface (the front-end) is completely separated from the rest of the system, making it easy to modify and also to embed the search engine in existing applications.
  • Downloaded files and indices are compressed to save space.

last updated on:
April 13th, 2009, 19:45 GMT
license type:
GPL (GNU General Public License) 
developed by:
Martin Mares
ROOT \ Utilities
Sherlock Holmes
Download Button

In a hurry? Add it to your Download Basket!

user rating 1



Rate it!

Add your review! 1 USER REVIEW SO FAR