CorpusSearch 2.002.71

A tool that finds syntactic structures in a corpus
CorpusSearch is a tool that finds syntactic structures in a corpus of annotated sentence trees. It can be used as a research tool on a corpus, or as a development tool for building the corpus.

CorpusSearch 2 is a Java program that supports research in corpus linguistics. It is useful both for the construction of syntactically annotated (parsed) corpora and for searching them.

Both the input and output files of CorpusSearch are ordinary text files, with syntactic annotations in the Penn-Treebank format.


1. Download CS.jar
2. Put the file in a convenient place.
3. Open a terminal
4. Assuming that you have put CS.jar into the folder FOO, the following line will start CorpusSearch in any flavor of Unix that has Java installed (including Mac OS X):

% java -classpath /FOO/CS.jar csearch/CorpusSearch

Don't type the '%'. That stands for the terminal prompt. Note that we are assuming Unix path syntax and that FOO is a top-level directory. The classpath must give the full path, using appropriate syntax.

Main features:

  • Tree search configurations in CS are defined in a Boolean query language over tree predicates.
  • The output of a CS search is itself searchable.
  • CS runs on any Java-supported platform.
  • The CS query language contains many features to make searching easier and more intuitive for linguistic research.
  • CS has extensive user configuration options.

last updated on:
February 18th, 2010, 18:50 GMT
license type:
GPL (GNU General Public License) 
developed by:
Beth Randall
ROOT \ Science
Download Button

In a hurry? Add it to your Download Basket!

user rating 12



Rate it!
What's New in version 2.002.68
  • Added extend_span to revision software.
  • More cleaning up of "collapse".
read full changelog

Add your review!