Terrier 3.5

A probabilistic Java toolkit for building search engines.
Terrier project is a probabilistic Java toolkit for building search engines.

Terrier is software for the rapid development of Web, intranet, and desktop search engines.

More generally, it is a modular platform for building large-scale information retrieval applications, providing indexing and probabilistic retrieval functionalities.

It comes with a desktop search application.

Terrier has various cutting-edge features including parameter-free probabilistic retrieval approaches (such as Divergence from Randomness models), automatic query expansion/re-formulation methodologies, and efficient data compression techniques.

Terrier comes with a powerful proof-of-concept Desktop search application [Screenshots], and full TREC capabilities including the ability to index, query and evaluate the standard TREC collections, such as AP, WSJ, WT10G, .GOV and .GOV2.

Terrier is written in Java [Requirements] and has been successfully used for adhoc retrieval, Web search and cross-language retrieval, in a centralised or distributed setting.

Currently, it is also being used for running various applications.

Main features:

  • Open Source (Mozilla Public Licence)
  • Written in cross-platform Java
  • Highly compressed disk data structures.
  • Handling large-scale document collections.
  • Direct file for efficient query expansion.
  • Modular and open indexing and querying APIs.
  • Testbed for indexing and retrieval from standard TREC test collections.
  • Interactive querying application.
  • Desktop search application for searching various types of documents.
  • Input/output of gamma, unary and binary encoded integers for compressing streams or random access files.
  • Standard evaluation of TREC ad-hoc and known-item search retrieval results.
  • Indexing of tagged document collections, as well as documents of various formats, such as HTML, PDF, or Microsoft Word, Excel and Powerpoint files.
  • Indexing of field information.
  • Indexing of position information on a word, or a block level.
  • Support for classic retrieval models, such as tf-idf, BM25 and Ponte-Croft language model, and Rocchio's query expansion.
  • Provides a number of Divergence From Randomness (DFR) document ranking models.
  • Provides a number of parameter-free DFR term weighting models for automatic query expansion.
  • Advanced query language that supports AND/NOT operators, phrase and proximity search.
  • Flexible processing of terms through a pipeline of components, such as stop-words removers and stemmers.

last updated on:
June 17th, 2011, 8:03 GMT
price:
FREE!
developed by:
University of Glasgow
homepage:
ir.dcs.gla.ac.uk
license type:
MPL (Mozilla Public License) 
category:
ROOT \ Information Management

FREE!

In a hurry? Add it to your Download Basket!

user rating 22

2.3/5
 

0/5

What's New in This Release:
  • Indexing:
  • TR-117: Improve fields support by SimpleXMLCollection
  • TR-120: Error loading an additional MetaIndex structure (contributed by Javier Ortega, Universidad de Sevilla)
  • TR-106: Pipeline Query/Doc Policy Lifecycle (contributed by Giovanni Stilo, University degli Studi dell'Aquila and Nestor Laboratory - University of Rome "Tor Vergata")
read full changelog

Add your review! 1 USER REVIEW SO FAR

SUBMIT