CorpusSearch is a tool that finds syntactic structures in a corpus of annotated sentence trees. It can be used as a research tool on a corpus, or as a development tool for building the corpus.
CorpusSearch 2 is a Java program that supports research in corpus linguistics. It is useful both for the construction of syntactically annotated (parsed) corpora and for searching them.
Both the input and output files of CorpusSearch are ordinary text files, with syntactic annotations in the Penn-Treebank format.
1. Download CS.jar
2. Put the file in a convenient place.
3. Open a terminal
4. Assuming that you have put CS.jar into the folder FOO, the following line will start CorpusSearch in any flavor of Unix that has Java installed (including Mac OS X):
% java -classpath /FOO/CS.jar csearch/CorpusSearch
Don't type the '%'. That stands for the terminal prompt. Note that we are assuming Unix path syntax and that FOO is a top-level directory. The classpath must give the full path, using appropriate syntax.
Here are some key features of "CorpusSearch":
· Tree search configurations in CS are defined in a Boolean query language over tree predicates.
· The output of a CS search is itself searchable.
· CS runs on any Java-supported platform.
· The CS query language contains many features to make searching easier and more intuitive for linguistic research.
· CS has extensive user configuration options.
· Java 2 Standard Edition Runtime Environment
What's New in This Release: [ read full changelog ]
· Added extend_span to revision software.
· More cleaning up of "collapse".