dbacl is a digramic Bayesian text classifier. Given some text, it calculates the posterior probabilities that the input resembles one of any number of previously learned document collections.
dbacl project can be used to sort incoming email into arbitrary categories such as spam, work, and play, or simply to distinguish an English text from a French text.
It fully supports international character sets, and uses sophisticated statistical models based on the Maximum Entropy Principle.
The dbacl project includes a tutorial or two, and a mathematical design paper (.ps.gz). Alternatively, browse the online manual pages for dbacl, bayesol, mailcross, mailtoe, mailfoot, mailinspect.
I have found two uses for dbacl so far:
* As an automated Bayesian email classification tool, it can recognize spam, and more generally sort incoming email into any number of categories such as work, play, etc.
* As a noise filter, it is useful during the indexing of personal document collections.
Both dbacl and its companion programs are written in C and run on UNIX/POSIX.
What's New in This Release:
· This is a hodge-podge of fixes and improvements.
· A new hypex command, the TREC 2005 options files, and an essay on chess are now in the tarball.
· Several improvements to the parsing engine were made, including a new -e char option and bugfixes.
· Compilation problems on various architectures were fixed, and libslang2 support was added.