Txr is an interpreter for the txr query language. A txr query matches text and extracts pieces by binding them to variables that are embedded in the query. Txr can output the raw bindings gathered from the data, or substitute them into a template-driven report.
Great, but we already have sed, awk, perl ...
Though these tools support pattern matching in the form of regular expressions, they do not implement a whole-input pattern matching paradigm like txr.
All but the simplest text extraction tasks are difficult with sed, which is basically a regexp filtering program. When the data format spans multiple lines which correlate together, sed starts to show its weakness. Awk and perl are programming languages. They can be used to perform complex text extraction, but it's expressed as an algorithm.
A pattern is some form which resembles that which it matches. A perl or awk program isn't a pattern; it bears no resemblance to the data which is being processed; it describes the detailed steps of the process more than the data. For many such processes, a clearer, more succinct Txr query can be written to do the same thing. An analogy may be drawn to other pattern languages such as grammars. A BNF grammar describes a language in a way that, say, the C++ source code of a recursive descent parser does not.
To develop a txr query, the user typically starts with sample data. The raw data itself is already likely a txr query which matches itself, after care is taken to escape some characters which have a special meaning to txr. All that is left is to identify the parts that need to be variables, and to summarize the variations so that the query generalizes to all instances of the data.
In short, a truly practical extraction and report language has arrived, and its name is Txr.
Talk is cheap; how about an example?
Fine. Instead of "Hello, world", how about something more advanced? One tool that I dislike in Unix and Linux is the ps utility for listing processes. I've been using Unix since 1989 and Linux since 1993, and I'm not dumb; yet, whenever I need ps to do something slightly out of the ordinary, I have to resort to the man page, and then I still can't get it to do what I want half the time.
With Txr, we can easily make a quick and dirty ps utility (which relies on the /proc filesystem on Linux). Here is what the query looks like. This might be saved in a file called ps.txr:
@(next)$/proc
@(collect)
@{process /[0-9]+/}
@ (next)/proc/@process/status
Name:@ @name
State:@ @state (@state_desc)
@(skip)
Tgid:@ @tgid
Pid:@ @proc_id
PPid:@ @parent_id
@(bind pid proc_id)
@(bind ppid parent_id)
@(skip)
Uid:@ @uid@ @/.*/
Gid:@ @gid@ @/.*/
@ (next)$/proc/@process/task
@ (collect)
@thr
@ (end)
@ (bind thread thr)
@ (some)
@ (next)/etc/passwd
@ (skip)
@user:@pw:@uid:@/.*/
@ (or)
@ (bind user uid)
@ (end)
@(end)
@(output)
USER PID PPID S NAME THREADS
@ (repeat)
@{user 8} @{proc_id -5} @{parent_id -5} @state @{name 16} @(rep)@thr, @(first)@(last)@thr@(single)~@(end)
@ (end)
@(end)
Now, we can run the query like this:
shell$ txr ps.txr
We get output which looks like this:
USER PID PPID S NAME THREADS
root 1 0 S init ~
root 2 1 S ksoftirqd/0 ~
root 3 1 S events/0 ~
root 4 3 S khelper ~
root 5 3 S kacpid ~
root 16 3 S kblockd/0 ~
root 29 3 S aio/0 ~
root 17 1 S khubd ~
root 2954 2953 S bash ~
[ ... ]
root 16134 1887 S sshd ~
kaz 16136 16134 S sshd ~
kaz 16137 16136 S bash ~
kaz 3628 2175 S slrn ~
root 3721 1963 S crond ~
root 3722 3721 S run-parts ~
root 3723 3722 S 00-logwatch ~
root 3724 3722 S awk ~
root 3940 3723 S mail ~
root 4049 3723 S zz-disk_space ~
root 4051 4049 S df ~
root 4052 4049 S grep ~
kaz 4266 1 S ssh-agent ~
kaz 4331 16137 S vim ~
kaz 4426 31908 R txr ~
The Txr query works by processing the numeric entries under the /proc directory, reading the /proc/< pid >/status file of each process, and the list of threads under /proc/< pid >/tasks. The user ID's are resolved by matching through the /etc/passwd file.
Product's homepage
What's New in This Release: [ read full changelog ]
· This version has been ported to OS X, FreeBSD, and NetBSD, supports a few popular regex tokens, exposes the regex compiler as a function (allowing programs to build and use regular expression syntax trees), provides new ways of iterating over hash tables with lazy lists, adds some time functions, improves seeding of PRNG, and fixes a bug related to argument processing in @(next) directive.