
Fighting Spam @ Clarinet
Published on July 14, 2011

Introduction
Fighting spam, for the service provider, consists of mounting a defence from a series of ongoing attacks. Some of these attacks are new, but the vast majority are repeats of strategies that worked some time in the past. In the case of old strategies, we can respond quickly and effectively because we already have mechanisms in place; new strategies tend to succeed for a time because we have to engineer a solution. Service providers have a number of weapons in their arsenal including:- Greylisting
- Black lists
- Forward and Reverse DNS checks
- Mail filtering
- Spam traps
- Personal Virus filters
- Personal Mail filters
- Learning mail filters
Spam, Scams, Phishing, and Malware
Over half of the e-mail arriving at Clarinet’s mail servers is unwanted by the final recipients. This e-mail is commonly known as spam. However, it is more useful to classify this e-mail into a number of sub-types based on the objectives of the sender:- Unsolicited Commercial E-mail (UCE)
- Senders of this type of e-mail want the recipients to buy an item. These messages contain an offer for goods or services and some instructions for ordering them. This type of e-mail always contains one piece of information that identifies the sender, otherwise there is no way for them to collect the orders. UCE works because e-mail is cheap to send so even very low response rates result in a profit. Strictly speaking the word spam just applies to this type of e-mail, however, its casual usage has increased to cover any unwanted e-mail.
- Scams
- Although P.T. Barnum didn’t coin the expression “There’s a Sucker Born Every Minute”, the Internet is rife with schemes that assume this is true and are designed to separate people from their money. These schemes often promise riches in exchange for little effort if the recipient will just provide a little money or access to their bank accounts. Once again the low cost of sending out e-mails makes it cost effective to find the relatively few respondents.
- Phishing
- Phishing attempts to trick the recipient into providing personal information, typically usernames, passwords and bank account details to the sender. These schemes are quite sophisticated in that they often exactly reproduce the look and feel of a genuine communication from a bank or service provider.
- Malware
- Viruses, bots and spyware are programs which the sender of the message wants the recipient to run:
- Viruses just want to reproduce but may also carry a payload that damages the running computer or installs spyware or a bot
- Bots respond to instructions and can be used to send e-mail or damage the services of other users
- Spyware extracts information from the target computer. This information could include the keystrokes of passwords or the contents of files
Malware is almost always an executable program, thus we only need to check executable programs to see if they are a recognised virus, bot or spyware
Spam Defence at the ISP

False positives, False Negatives
The difficulty for the service provider is that senders of unwanted e-mails work hard to make them look like ordinary e-mail that the recipients want. The service provider sorts e-mails into groups:- We definitely think this is spam
- Not sure
- We definitely think this is a desired e-mail
almost no-one wants to receive a virus in an e-mail message, however, a computer security researcher may have asked a colleague to have send him an example of a new virus for his virus collection.For any method the rate of false positives is inversely related to the rate of false negatives. Thus tightening the rules decreases the amount of spam, but at the cost of increasing the number of messages that are misclassified as spam.
Greylisting
We use a specially modified version of OpenBSD’s pfspamd to provide this service. Our patches to pfspamd:
Greylisting is the most recent addition to our arsenel of anti-spam tools. The concept of greylisting is based on the observation that normal mail servers attempt to re-send mail if they are politely told that the mail server is unable to accept it at this time; on the other hand bulk e-mail tools are designed to send as much e-mail as possible, so when they are told to come back later, they tend not to. The secret to making greylisting not delay all e-mail is to allow mail servers which have successfully retried sending an e-mail to always get through immediately.
There are a number of problems with greylisting:
- allow it to work on FreeBSD with IPFW
- enable stuttering to discourage bulk e-mailers by wasting their time
- delay closing the connection after issuing the temporary failure message to prevent some mail servers from assuming the connection was lost and ignoring the temporary failure message
- add connection ids for tracking sessions in logs
- Some mail providers have pools of mail servers and messages will only get accepted when a message comes from the same server twice.
- Some mail clients try to talk directly to their destinations and hence don’t retry regularly.
- The first time a mail server is seen its e-mail is delayed.
Black lists
We use several black lists including:
As sources of spam are identified these are reported to operators of real time black lists. Service providers who subscribe to the black list check each incoming mail server connection against the black list, if it appears on the list the message is not accepted and the reject message suggests that the genuine senders should appeal to the black list to be removed. Spammers don’t get these messages as they tend to conceal their identity by pretending to be users of other systems.
Some black lists also list misconfigured mail servers that can be exploited by spammers for sending e-mails indirectly. These misconfigured mail servers are known as open relays.
Black lists can be highly effective as they can gather information rapidly from many corners of the globe.
Black lists are imperfect too:
- combined.njabl.org
- xbl.spamhaus.org
- bl.spamcop.net
- Black lists are necessarily behind the producers of spam: They can only list new sources of spam; and misconfigured mail servers after they are detected
- The quality of the black list depends on the dilligence of the maintainers. Maintainers need to both add new sources quickly and remove incorrectly added sites quickly. A failure to do either of these results in spam getting through or valid e-mails rejected
Forward and Reverse DNS checks
Clarinet does not reject a mail just because it fails the forward and reverse DNS checks. We just add points to it its spamassasin score (see mail filtering).
Humans work well with names whilst computers work well with numbers. The Domain Name System (DNS) translates names to numbers and vice-versa. Service providers can check:
- that the site talking to it has a reverse DNS entry ie. has a name associated with its number
- that the name gained from the reverse DNS entry when looked up matches the IP address of the site talking to it
Mail filtering
There are many aspects to mail filtering:- Pattern matching
- Dangerous attachment type removal
- Virus filtering
- Optical character recognition
- Image statistical characteristics
Pattern Matching
Many spam messages contain sets of words or characters that are common in spam but not common in ordinary messages. For instance a message containing a misspelling of “Viagra” and instructions on how to buy it on line is quite likely to be spam. By allocating points to each of these sets of words or characters, a score can be computed of each message. Messages with high scores can be discarded and messages with intermediate or low scores are passed on to the recipient. In addition to matching the content of the message, the headers of the message are examined for features that indicate that part of the message path has been faked or that the message is from a non-existant source. Valid e-mail is very unlikely to have these features hence they attract a high number of points.Dangerous attachment type removal
Some attachments can be used as vectors for malware or phishing. These attachement types are typically used for scripts and programs. Furthermore, there are some attachments that have no meaning when sent through e-mail. An example of a useless attachement is:.lnk files that refer to local files are not useful on machines other than the sending computer, unless there is already a file on the same place on the receiving machine with the same contentsRemoving attachments because they might contain something bad protects customers from new malware for which signatures have yet to be generated, but also generates resentment from knowledgable users when harmless items have been removed.
Virus filtering
We use ClamAV to filter for viruses and update both the engine and the signatures regularly.
Attachments are unpacked and scanned. The scanner checks for signatures in the files and if it matches then the the e-mail is rejected.
Virus filtering has two flaws, both minor:
- Like pattern matching, virus scanners are always playing catchup with the virus writers. It takes time for a new virus to be analysed and a signature created and distributed. During this window viruses can spread.
- There is a very small chance that a virus signature will match a file that does not contain a virus. This would lead to the e-mail being rejected
Optical character recognition
We have developed our own OCR based spam scanning and have integrated it with spamassasin.
The effectiveness of pattern matching has driven some spammers to avoid using words so they send pictures instead. By running these pictures through an Optical Character Recognition (OCR) program and applying pattern matching to the output it is possible to detect spam features in the pictures.
Image statistical characteristics
We have developed our own statistical measures for detecting images used in spam messages and have integrated it with spamassasin.
In the continuing war between spam senders and ISPs. The spammers have discovered that distorting their images and ading speckles and other “noise” to their images reduces the effectiveness of OCR. However, some statistical properties are still more common in spam images than in other images. These include measures of the information density of the image (the number of bytes required to represent a pixel) and that spam images tend not to have extreme aspect ratios (spam is rarely a few pixels high and a screen wide unlike section separators). Although these measures are not definitive they can be used in combinations with other measures to improve detection.
Spam traps
A spam trap is an e-mail address which never gets sent real e-mail. Thus any message in the spam trap must be spam. These messages are matched against other e-mail messages sent to the system and any message that matches significantly is rejected.Spam Defence at the Recipient
Because only the final recipient of an e-mail can be the true arbiter of whether the mail is desired or not, personal anti spam tools can can make better choices for the individual than tools at the ISP.Personal Virus Filters
Every e-mail recipient should have an up to date virus scanner on their computer. The chances are high that the virus scanner will be different to the ones used at the ISP providing defence in depth for their PC. Also by scanning your own files you get a second chance to catch viruses which have slipped through the service providers virus scanners.Personal Mail filters
There are many commercial products that filter e-mail that you can run on your own machine. Some of these are built into e-mail viewers.Learning mail filters
There are a class of e-mail filters that can learn about an individuals preferences. These tools are taught a user preferences, by marking the mail as spam or desired e-mail. Various mathematical models can be used on the collection of good and spam e-mail. When a new e-mail is presented, the model is applied, and the model marks the mail as either good or spam. Some models include:- Bayesian Analysis
- Linguistic Analysis