Softpedia
 


LINUX CATEGORIES:



GLOBAL PAGES >>
NEWS ARCHIVE >>
SOFTPEDIA REVIEWS >>
MEET THE EDITORS >>
WEEK'S BEST
  • Linux Kernel 3.9.6 / 3....
  • Linux Kernel 3.0.82 LTS...
  • KDE Software Compilatio...
  • PulseAudio 4.0
  • Wireshark 1.10.0
  • NetworkManager 0.9.8.2
  • LibreOffice 3.6.6 / 4.0...
  • SystemRescueCd 3.7.0
  • Linux Kernel 3.10 RC6
  • Ubuntu Tweak 0.8.5
  • Home > Linux > Information Management

    Terrier 3.5

    Download button

    No screenshots available
    Downloads: 1,532  View global page NEW!  Tell us about an update
    User Rating:
    Rated by:
    Fair (2.3/5)
    22 user(s)
    Developer:

    License / Price:

    Last Updated:

    Category:
    University of Glasgow | More programs
    MPL / FREE
    June 17th, 2011, 08:03 GMT [view history]
    ROOT / Information Management

     Read user reviews (1)  Refer to a friend  Subscribe

    Terrier description

    A probabilistic Java toolkit for building search engines.

    Terrier project is a probabilistic Java toolkit for building search engines.

    Terrier is software for the rapid development of Web, intranet, and desktop search engines.

    More generally, it is a modular platform for building large-scale information retrieval applications, providing indexing and probabilistic retrieval functionalities.

    It comes with a desktop search application.

    Terrier has various cutting-edge features including parameter-free probabilistic retrieval approaches (such as Divergence from Randomness models), automatic query expansion/re-formulation methodologies, and efficient data compression techniques.

    Terrier comes with a powerful proof-of-concept Desktop search application [Screenshots], and full TREC capabilities including the ability to index, query and evaluate the standard TREC collections, such as AP, WSJ, WT10G, .GOV and .GOV2.

    Terrier is written in Java [Requirements] and has been successfully used for adhoc retrieval, Web search and cross-language retrieval, in a centralised or distributed setting.

    Currently, it is also being used for running various applications.


    Product's homepage

    Here are some key features of "Terrier":

    · Open Source (Mozilla Public Licence)
    · Written in cross-platform Java
    · Highly compressed disk data structures.
    · Handling large-scale document collections.
    · Direct file for efficient query expansion.
    · Modular and open indexing and querying APIs.
    · Testbed for indexing and retrieval from standard TREC test collections.
    · Interactive querying application.
    · Desktop search application for searching various types of documents.
    · Input/output of gamma, unary and binary encoded integers for compressing streams or random access files.
    · Standard evaluation of TREC ad-hoc and known-item search retrieval results.
    · Indexing of tagged document collections, as well as documents of various formats, such as HTML, PDF, or Microsoft Word, Excel and Powerpoint files.
    · Indexing of field information.
    · Indexing of position information on a word, or a block level.
    · Support for classic retrieval models, such as tf-idf, BM25 and Ponte-Croft language model, and Rocchio's query expansion.
    · Provides a number of Divergence From Randomness (DFR) document ranking models.
    · Provides a number of parameter-free DFR term weighting models for automatic query expansion.
    · Advanced query language that supports AND/NOT operators, phrase and proximity search.
    · Flexible processing of terms through a pipeline of components, such as stop-words removers and stemmers.

    What's New in This Release: [ read full changelog ]

    Indexing:
    · TR-117: Improve fields support by SimpleXMLCollection
    · TR-120: Error loading an additional MetaIndex structure (contributed by Javier Ortega, Universidad de Sevilla)
    · TR-106: Pipeline Query/Doc Policy Lifecycle (contributed by Giovanni Stilo, University degli Studi dell'Aquila and Nestor Laboratory - University of Rome "Tor Vergata")
    · TR-116: Lexicon not properly renamed on Windows
    · TR-118: SimpleXMLCollection - the term near the closing tag is ignored (contributed by Damien Dudognon, Institut de Recherche en Informatique de Toulouse)
    · TR-123: Null pointer exception while trying to index simple document (contributed by Ilya Bogunov)
    · TR-126: Logging improvements
    · TR-124: When processing docid tag in MEDLINE format XML file, xml context path is needed
    · TR-127: Easier refactoring of SinglePass indexers (contributed by Jonathon Hare, University of Southampton)
    · TR-108: Some indexers do not set the IterablePosting class for the DirectIndex (contributed by Richard Eckart de Castilho, Darmstadt University of Technology)
    · TR-136: Hadoop indexing misbehaves when terrier.index.prefix is not "data"
    · TR-137: TRECCollection cannot add properties from the document tags to the meta index at indexing time
    · TR-150: TRECCollection parse DOCHDR tags, including URLs should they exist (see TRECWebCollection)
    · TR-138: IndexUtil.copyStructure fails when source and destination indices are same
    · TR-140: Indexing support for query-biased summarisation
    · TR-144: CollectionRecordReader.next should not be recursive
    · TR-146, TR-148: Tokenisation should be done separately from Document parsing (the tokeniser can be set using the property tokeniser - see Non English language support in Terrier for more information on changing the tokenisation used by Terrier); Refactor Document implementations (e.g. TRECDocument and HTMLDocument are now deprecated in favour of the new TaggedDocument)
    · TR-147: Allow various Collection implementations to use different Document implementations
    · TR-158: Single pass indexing with default configuration doesn't ever flush memory

    Retrieval:
    · TR-16,TR-166: Extending query language and Matching to support synonyms
    · TR-157: Remove TRECQuerying scripting files: trec.models, qemodels, trec.topics.list and trec.qrels - use properties in TRECQuerying instead.
    · TR-156: Deploy a DAAT matching strategy - see org.terrier.matching.daat (partially contributed by Nicola Tonellotto, CNR)
    · TR-113: The LGD Loglogistic weighting model (contributed by Gianni Amati, FUB)
    · TR-105: Index should check version number as it can't open older indices
    · TR-107: DirectIndex.getTerms() is broken
    · TR-110: TRECDocnoOutputFormat assumes metadata key is "docno"
    · TR-112: "Term not found" log message should not be a warning
    · TR-121: Distance.noTimesSameOrder() can throw ArrayIndexOutOfBoundsException
    · TR-129: Posting.getDocumentLength() does not work for postings from the direct file
    · TR-130: Manager should use Index specified in Request object
    · TR-131: Parsing of WeightingModel class names could be better
    · TR-132: Some BitIn implementations don't pass unit tests
    · TR-139: Manager should balk at null Index in constructor
    · TR-141: GammaFunction is not good enough for proximity - this fixes the retrieval effectiveness of DFRDependenceScoreModifier
    · TR-142: Matching implementations should not overwrite the EntryStatistics stored in the MatchingQueryTerms object
    · TR-143: BitFileBuffered creates unnecessary byte arrays
    · TR-145: ResultSet implementations don't retain exactResultSize() in child ResultSets
    · TR-149: Added first Divergence from Independence model, TR-153,TR-154,TR-155: Provide a Matching implementation that reads results from TREC run files (see TRECResultsMatching)
    · TR-160: Inv2DirectMultiReduce needs improvement to allow direct split across multiple files
    · TR-161: Use Tokenisers in query side tokenisation
    · TR-163: Index does not explicitly close the properties file
    · TR-164: Document index structure is left open when index.close() is called
    · TR-165: SingleLineTRECQuery opens all files as UTF
    · TR-167: Large document metadata are stored incorrectly by MetaIndex
    · Two new 2nd generation Divergence from Randomness models: JsKLs and XSqrA_M (contributed by Gianni Amati, Fondazione Ugo Bordoni)

    Testing:
    · Added a considerable number of additional JUnit tests
    · TR-134: BitPostingIndexInputFormat needs a unit test
    · TR-135: TestPostingStructures should test skipping of stream structures
    · TR-151: SimpleFileCollection and chums (FileDocument etc) have no unit test
    · TR-159: Junit end-to-end test for WT2G test collection

    Desktop:
    · TR-103: Desktop search cant open files on 64bit Windows

    Other:
    · TR-168: Terrier batch scripts can fail when the TERRIER_HOME environment variable is set on Windows 64bit
    · TR-115: Upgrade Hadoop support for 0.20
    · TR-104: Move to Java 6
    · TR-119: Temporary jar/properties in HDFS /tmp are not deleted
    · TR-152: TagSet should detect a tag in both process and skip entries

      


    TAGS:

    rapid development | desktop search | modular platform | Terrier | rapid | development

    Go to top

    WindowsGamesDriversMacLinuxScriptsMobileHandheldNews

    SUBMIT PROGRAM   |   ADVERTISE   |   GET HELP   |   SEND US FEEDBACK   |   RSS FEEDS   |   UPDATE YOUR SOFTWARE   |   ROMANIAN FORUM