PDFassassin 0.1 Beta

PDFassassin is a PDF module for SpamAssassin.
With the recent torrent of PDF spam, we created a module for SpamAssassin that allows for the scanning of PDF files. The module, linked below this post, works in the following way:

Email bodies are scanned upon connection, and checked for PDF attachments.
Text is extracted from the PDF via pdftotext, and scanned by SpamAssassin.
Should the PDF contain images, the gocr binary is called to extract the text content.
The total spam score of the PDF is compared against the global required_score setting; if it’s higher, a score equal to the one specified in pdf.cf (default of 10) is appended to the overall score of the email message.

This approach is a departure from the usual method as it scans the content against the SpamAssassin engine, instead of using a word list filter.



Installation directions:

To install, just copy the Pdf.pm and pdf.cf files to your SpamAssassin configuration directory. It is /usr/local/atmail/spamassassin/etc/ in @Mail installations, and /etc/mail/spamassassin in other default Linux installs.

You will need to lint the SpamAssassin configuration files afterwards:

$ /usr/local/atmail/spamassassin/bin/spamassassin -D --lint

Should the installation be successful, you should see the following:

dbg: plugin: fixed relative path: /usr/local/atmail/spamassassin/etc//Pdf.pm
dbg: plugin: loading Pdf from /usr/local/atmail/spamassassin/etc/Pdf.pm

This will then allow you to scan PDF attachments via SpamAssassin.

Should you need to change the scoring of the PDF spam emails, you can do it by editing the pdf.cf file. You can then edit the score definition:

score PDF 10.0

What's New in This Release:

This initial release implements the module for spamassassin which scans the content of PDF attachments in email messages and appends the spam score to that of the originating email message.

