TextSearch is a program that helps you search through a set of text files which are in a hierarchical structure, i.e. a directory structure. Each document is searched using a regular expression and an overview of the results is shown as a tree structure. By clicking on a file, it can be viewed, with matches being highlighted.
As opposed to other programs out there, focus is not so much on statistics, i.e. how often a word would occur in an entire corpus of files, but rather on occurrences in single files.
TextSearch is published as open source under the GPL.
Why do I need it?
a) You're writing your diploma thesis about the occurences of certain terms in publications by various political parties, and you need a way to quickly look for a certain term in many files.
b) You're a programmer, you have a large code tree, and you'd just like to know where you've used sprintf() instead of snprintf().
· Python (tested against 2.5)
· Qt 4
How do I use it?
a) Convert your files to text files, if they aren't already. To convert a word document, I recommend "antiword", for PDF documents (which can be difficult), I recommend pdftotext from the xpdf
TextSearch tries to work very well with special characters. In order for that to work, you obviously need to choose the right encoding. Both of the above converters allow you to do that. I recommend using 'UTF-8' or ISO-8859-1 (aka Latin1) encoding, the latter in the case of Western European languages, the first for everything else.
b) Start TextSearch. See point 3 about that. Now you need to tell it where it can find your files. To that end, click on the button labelled "Directory" and choose the base directory. Now choose the correct encoding in the drop-down menu. It will remember those settings next time you start TextSearch.
c) Enter a search term. See below for details.
d) Hit "Go". It will search your files, and a tree listing will appear in the main widget.
e) If you want to look at the contents of a file (with a match), click on the file's name in the listing. It will be shown, with matches highlighted in green.
The syntax used for search expressions is called "regular expressions". These are pretty common, and there's about a trillion pages explaining them on the web. Two implementations have been used for TextSearch, which can be chosen using the drop-down menu at the top:
a) The QREgex uses the implementation in the Qt library. The advantage of this is that you can use unicode expressions in your search, i.e. you can search for special characters.
b) The python re module is also available, by choosing "Regex". This does not allow unicode characters, but I left it there anyway since it's slightly different from the QT implementation and I thought some people might prefer it.
c) Both searches are, by default, case insensitive. If you choose the search options with the suffix "-Case", they are case sensitive.
By clicking on "File types", you can tell TextSearch what files it is supposed to consider for the search, which are distinguished by their file suffixes. You can enter a space-separated list of such suffixes, or tell it (default) to search all files. It will remember your choice.