Terrier Changelog

New in version 3.5

June 17th, 2011
  • Indexing:
  • TR-117: Improve fields support by SimpleXMLCollection
  • TR-120: Error loading an additional MetaIndex structure (contributed by Javier Ortega, Universidad de Sevilla)
  • TR-106: Pipeline Query/Doc Policy Lifecycle (contributed by Giovanni Stilo, University degli Studi dell'Aquila and Nestor Laboratory - University of Rome "Tor Vergata")
  • TR-116: Lexicon not properly renamed on Windows
  • TR-118: SimpleXMLCollection - the term near the closing tag is ignored (contributed by Damien Dudognon, Institut de Recherche en Informatique de Toulouse)
  • TR-123: Null pointer exception while trying to index simple document (contributed by Ilya Bogunov)
  • TR-126: Logging improvements
  • TR-124: When processing docid tag in MEDLINE format XML file, xml context path is needed
  • TR-127: Easier refactoring of SinglePass indexers (contributed by Jonathon Hare, University of Southampton)
  • TR-108: Some indexers do not set the IterablePosting class for the DirectIndex (contributed by Richard Eckart de Castilho, Darmstadt University of Technology)
  • TR-136: Hadoop indexing misbehaves when terrier.index.prefix is not "data"
  • TR-137: TRECCollection cannot add properties from the document tags to the meta index at indexing time
  • TR-150: TRECCollection parse DOCHDR tags, including URLs should they exist (see TRECWebCollection)
  • TR-138: IndexUtil.copyStructure fails when source and destination indices are same
  • TR-140: Indexing support for query-biased summarisation
  • TR-144: CollectionRecordReader.next should not be recursive
  • TR-146, TR-148: Tokenisation should be done separately from Document parsing (the tokeniser can be set using the property tokeniser - see Non English language support in Terrier for more information on changing the tokenisation used by Terrier); Refactor Document implementations (e.g. TRECDocument and HTMLDocument are now deprecated in favour of the new TaggedDocument)
  • TR-147: Allow various Collection implementations to use different Document implementations
  • TR-158: Single pass indexing with default configuration doesn't ever flush memory
  • Retrieval:
  • TR-16,TR-166: Extending query language and Matching to support synonyms
  • TR-157: Remove TRECQuerying scripting files: trec.models, qemodels, trec.topics.list and trec.qrels - use properties in TRECQuerying instead.
  • TR-156: Deploy a DAAT matching strategy - see org.terrier.matching.daat (partially contributed by Nicola Tonellotto, CNR)
  • TR-113: The LGD Loglogistic weighting model (contributed by Gianni Amati, FUB)
  • TR-105: Index should check version number as it can't open older indices
  • TR-107: DirectIndex.getTerms() is broken
  • TR-110: TRECDocnoOutputFormat assumes metadata key is "docno"
  • TR-112: "Term not found" log message should not be a warning
  • TR-121: Distance.noTimesSameOrder() can throw ArrayIndexOutOfBoundsException
  • TR-129: Posting.getDocumentLength() does not work for postings from the direct file
  • TR-130: Manager should use Index specified in Request object
  • TR-131: Parsing of WeightingModel class names could be better
  • TR-132: Some BitIn implementations don't pass unit tests
  • TR-139: Manager should balk at null Index in constructor
  • TR-141: GammaFunction is not good enough for proximity - this fixes the retrieval effectiveness of DFRDependenceScoreModifier
  • TR-142: Matching implementations should not overwrite the EntryStatistics stored in the MatchingQueryTerms object
  • TR-143: BitFileBuffered creates unnecessary byte arrays
  • TR-145: ResultSet implementations don't retain exactResultSize() in child ResultSets
  • TR-149: Added first Divergence from Independence model, TR-153,TR-154,TR-155: Provide a Matching implementation that reads results from TREC run files (see TRECResultsMatching)
  • TR-160: Inv2DirectMultiReduce needs improvement to allow direct split across multiple files
  • TR-161: Use Tokenisers in query side tokenisation
  • TR-163: Index does not explicitly close the properties file
  • TR-164: Document index structure is left open when index.close() is called
  • TR-165: SingleLineTRECQuery opens all files as UTF
  • TR-167: Large document metadata are stored incorrectly by MetaIndex
  • Two new 2nd generation Divergence from Randomness models: JsKLs and XSqrA_M (contributed by Gianni Amati, Fondazione Ugo Bordoni)
  • Testing:
  • Added a considerable number of additional JUnit tests
  • TR-134: BitPostingIndexInputFormat needs a unit test
  • TR-135: TestPostingStructures should test skipping of stream structures
  • TR-151: SimpleFileCollection and chums (FileDocument etc) have no unit test
  • TR-159: Junit end-to-end test for WT2G test collection
  • Desktop:
  • TR-103: Desktop search cant open files on 64bit Windows
  • Other:
  • TR-168: Terrier batch scripts can fail when the TERRIER_HOME environment variable is set on Windows 64bit
  • TR-115: Upgrade Hadoop support for 0.20
  • TR-104: Move to Java 6
  • TR-119: Temporary jar/properties in HDFS /tmp are not deleted
  • TR-152: TagSet should detect a tag in both process and skip entries

New in version 3.0 (March 11th, 2010)

  • Indexing:
  • TR-14, TR-42, TR-56, TR-102: Various changes to the format of the index, to promote reuse, scalability and speed.
  • TR-17, TR-50, TR-54, TR-77: Added MetaIndex for document metadata. DOCNOs etc need not be in lexographical order.
  • TR-43, TR-48, TR-69, TR-70: Fields should contain frequency information.
  • TR-39, TR-40, TR-41, TR-46, TR-50, TR-83, TR-88: Various improvements and bug fixes to MapReduce indexing.
  • TR-44, TR-55: Improve robustness of single-pass indexing.
  • TR-71, TR-98: Allow Bit posting structures to be split across multiple files.
  • TR-28, TR-91: Index WARC collections (UK-2006, ClueWeb09).
  • TR-34: Documentation update: Property values for single-pass indexing are not scaled.
  • TR-37, TR-38, TR-47,TR-57, TR-78, TR-79, TR-93, TR-94: Generate the direct file from an inverted index as a MapReduce job.
  • Retrieval:
  • TR-20, TR-42, TR-64: Access the posting list for one term as a stream - see Posting and IterablePosting.
  • TR-86: Matching should be an interface.
  • TR-87: PorterStemmer doesn't match expected output by Porter himself.
  • TR-81: Implements proximity term dependence models. For more information, see Configuring Retrieval.
  • TR-19: Support relevance feedback as well as pseudo-relevance feedback.
  • TR-68, TR-73, TR-74, TR-94: Implement field-based weighting models. For more information, see Configuring Retrieval.
  • TR-99: Provide way to integrate static doc prior easily. For more information, see Configuring Retrieval.
  • TR-90: MatchingQueryTerms does not retain query term order.
  • TR-26: Parse Million Query track topic files.
  • TR-49: Let TRECQuerying filename be predetermined by property.
  • TR-75: Allow to set runtag in runs.
  • TR-60: Removed PonteCroft language modelling.
  • TR-66, TR-84: Refactor TRECQuery.
  • TR-67: Request object should contain the Index.
  • TermScoreModifiers have been deprecated, and no longer work. You should use WeightingModel instead.
  • Testing:
  • Added considerable number of end-to-end and unit tests.
  • TR-59: Fixed reset problem in Terrier evaluation tool.
  • TR-76: Bump Junit version.
  • Desktop:
  • TR-61: Desktop example app should use MetaIndex.
  • Other:
  • TR-89: Check all .java and .sh files have Terrier license header.
  • TR-82: Have a simple webapps search results interface.
  • TR-80: Move code to terrier.org Java package namespaces.
  • TR-45: Add (read|write)(Delta|Golomb) etc to BitIn/BitOut.
  • TR-52: FSOrderedMapFile causes seek(-1) when searching for an entry less than the first.
  • TR-72: FSOrderedMapFile.EntryIterator.skip() breaks FSOrderedMapFile.EntryIterator.hasNext().
  • TR-95: FSArrayFile.ArrayFileIterator.skip() does not update entry index correctly.
  • TR-92: utility.io.CountingInputStream does not count single bytes correctly.
  • TR-53: Rounding.toString() doesnt work for 10dp.
  • TR-62: Files layer can transparently cache files.
  • TR-2, TR-65, TR-97: Replace Terrier's Makefile with Ant build.xml.
  • TR-63,TR-101: Documentation updates.
  • TR-100: Update default and sample terrier.properties files.

New in version 2.2 (December 26th, 2008)

  • This is a substantial update, which includes new support for Hadoop, primarily a Hadoop Map Reduce indexing system, allowing large collections of documents to be indexed in a highly distributed fashion.
  • Also included are various minor improvements, including improved support for the IIT CDIP1 (TREC Legal track) collection, and various bug fixes.
  • This is intended to be the ultimate release in the 2.x series.