Wednesday, August 16, 2006

blocking spam

There are two critical things that any anti-spam system must do, it must not lose email and it must not cause damage to the rest of the net.

To avoid losing email every message must be either accepted for delivery or the sender must be notified.

To avoid causing damage to the rest of the net spam should not be bounced to innocent third parties. To accept mail, process it, and then bounce messages that appear to be spam will result in spam being bounced to innocent third parties.

The only exception to these two conditions is for virus email which can be positively identified as being bad and therefore they can be silently discarded. For any other category of unwanted mail there is always a possibility of a false-positive and therefore the sender should be notified if the mail will not be accepted.

Therefore the only acceptable method of dealing with spam is to reject it at the SMTP protocol level. Currently I am not aware of any software that supports Bayesian filtering while the message is being received so that it can be rejected if it appears to be spam, it would be possible to do this (I could write the code myself if I had enough spare time) but AFAIK no-one has done it.

The most popular methods of recognising SPAM before it is accepted are through DNSBL lists (DNS based lists of IP addresses known to send SPAM), RHSBL lists (DNS based lists identifying domains that are known to be run by spammers), and Gray-listing (giving a transient error condition in the expectation that many spammers won't try again).

Gray-listing is not effective enough to be used on it's own, therefore DNSBL and RHSBL systems are required for a usable email system. Quickly reviewing the logs of some of my clients' mail servers suggests that the DNSBL dnsbl.sorbs.net alone is stopping an average of 20 SPAMs per user per day! The SORBS system is designed to block open relays, machines that send mail to spam-trap addresses, and some other categories of obviously inappropriate use. The number of false-positives is very small. On average I add about one white-list entry per month, which isn't much for the email for a dozen small companies. For every white list entry I have added I have known that the sender has had a SPAM problem. I have not had to add a white-list entry because of a DNSBL making a mistake, just because people want to receive mail from a system that also sends SPAM.

I was prompted to write about anti-spam measures by an ill-informed and unsubstantiated comment on my blog regarding DNSBL services.

If anyone wants to comment on this please feel free. But keep in mind that I have a lot of experience running mail servers including large ISPs with more than a million customers. The advice I give in terms of anti-spam measures concerns techniques that I have successfully used on ISPs of all sizes and that I have found to work well even when both ends use them. Make sure that you substantiate any comments you make and explain them clearly. Saying that something is stupid is not going to impress me when I've seen it work for over a million users.

1 comment:

Unknown said...

Note that SORBS is being criticized for indiscriminately blocking TOR.