The Computational Linguistics Toolset is a set of tools for computational linguistics. Computational Linguistics Toolset project contains re-usable code for cleaning, splitting, refining, and taking samples from corpora (ICE, Penn, and a native one), for tagging them using the TnT-tagger, for doing permutation statistics on N-grams (useful The tools themselves are well documented.
The tools are free (licensed under the General Public License). You get the entire tool-package (containing the newest version of all fiauimenre tools and the library) [in one download]. The tools are each documented, also a general [readme-file] is included. You can always e-mail me if something is still unclear (or if you found a bug).
A large number of these tools have been used for the Finnish Australian Immigrants Research. The Goall scripts are still configured for their usage in that research.
The core sensing tools for disambiguating using the [WordNet Similarity tools by Ted Pedersen], have been completed. They were built for speed (about 10 times faster). Actually some optimalizations I made for this have now also been included in the WordNet Similarity package (v 0.16).
What's New in This Release:
· A CorpusTagsetReducer tool was added to the corpus task-set for filtering out tags and tag-types.
· A RowChecker, TableScaler, and TableTurner tool were added to the examine-set for checking the alignment of tags and words and for manipulating tab-delimited output-tables.
· Several smaller fixes and additions were applied.