parp is a powerful, extensible e-mail filter with sophisticated anti-spam capabilities. It's made as a complete replacement for procmail, is MIME-aware, and acts as a filter, daemon, or on mailboxes.
This was yet another personal itch which needed scratching. I receive between 5 and 20 spam e-mails most days. It wasn't only mildly annoying to have to hit delete more than normal, but I also forward all e-mail which ends up in my main inbox to my mobile (cell) phone via email2sms and an Internet/SMS gateway, and I was sick to death of my phone bleeping a lot through the day purely due to junk mail.
I started looking at all the available anti-spam filters. Over a period of two years, I looked at many, including the NAGS filter, despam, various complex anti-spam procmailrcs, the spamometer, blackmail, filter.plx, zfilter, spamstop, junkfilter ... but various things put me off all of them:
Some weren't written in Perl. Call me a Perl bigot, but if there was ever a case of Perl being the right tool for the job, it's an e-mail filter. Extensibility and maintainability were very high on my list.
Some were terribly coded. I refuse to put my e-mail at the mercies of bad code (and that includes sendmail ;-).
Some insisted that you use a particular MDA or MUA. I have no intentions of changing from mutt and qmail.
Many filtered on only the headers, or only the body. I want to filter on both, not all the time, but in some circumstances.
None were as accurate as I wanted. My goal was at least 99% accuracy. (At the time of writing, parp's accuracy is hovering around the 99.8% mark.)
Here are some key features of "parp":
· Can act as a filter in a similar manner to procmail, or directly on files in Mbox format (and possibly other formats via Mail::Box - untested), or as a daemon processing mails from a spool. In the latter case, mails are injected into the queue via a tiny (15k on my system) executable which handles locking correctly.
· Standard filtering actions are available (deliver to mailbox, pipe to command, reject as junk etc.)
· Highly sophisticated spam detection heuristics: currently around 40 different tests performed in a worst case scenario, although all tests optimised for speed (e.g. fast tests performed on headers, then slower tests only performed on body if necessary). N.B. I'm considering incorporating the SpamAssassin ruleset at some point too.
· Optional cross-checking with the Open Relay Database.
· Filter adds X-Parp-Accepted: and X-Parp-Rejected: headers so that you can easily monitor its filtering strategy without leaving your mail reader.
· MIME multi-part aware, e.g. will not be confused by binary attachments.
· Berkeley DB format friends database, for keeping false positives to an absolute minimum.
· Automatic extraction of addresses into the friends database from emails which pass the spam tests. Semi-automatic removal of addresses from the friends database on the rare occasions parp gets it wrong. The friends database is also easily editable with my dbm utility.
· Other `grace' tests allowing bona fide persons' communications through (e.g. passworded e-mails) just in case all the other tests go badly wrong.
· The configuration files are written in raw Perl, so you can extend the filter arbitrarily using the main program's API.
· Comprehensive logging and error-trapping systems.
· Auxiliary program to print out comprehensive statistics on all aspects of filtering (see the sample output).
· Ability to log false positives/negatives when spam detection has gone wrong in a way which can be interpreted by the statistics program to determine the filter's current accuracy of spam detection.
· Mostly RFC822-compliant state machine parser of Received headers, enabling extensive spam trace analysis and retaliative action. Read its man page or source if you're curious.
· Duplicate removals (by message id).
· Emails which have already been filtered can be used as regression tests, to easily spot problems when you make changes to your filtering logic.
· Limited documentation so far. This is gradually improving.
· Requires some knowledge of Perl / programming. (Ironically, if it didn't, there would be far greater limitations to the filter's flexibility.)