Email Guides and Essays
by Kaitlin Duck Sherwood,
including Top Ten Tips for Overcoming Email Overload



About Overcome Email Overload with Eudora 5

About Overcome Email Overload with Microsoft Outlook 2000 and Outlook 2002

Frequently asked questions

Press room


About Kaitlin Duck Sherwood

World Wide Webfoot Press home

My Top Three Spam Filters

by Kaitlin Duck Sherwood


Note: Since I wrote this, I have noticed an amazing amount of activity in the area of anti-spam software. The second generation of anti-spam tools is much, much, MUCH more accurate than the previous generation of anti-spam tools. The latest crop of tools should do a much better job than you can do yourself. Read more in my essay, Second Generation Anti-Spam Solutions.

I have found that about three-quarters of my spam has at least one of the three main characteristics. And, if I set up a "whitelist" -- a list of people whose mail I always want to see -- I get almost no false hits.

Note: At the end of this page, I have links to implementation details for several of the most popular email programs. Alas, not all email programs are capable of implementing all three filters. Outlook Express for Windows, in fact, can't do any of them except whitelisting. (Outlook Express for Mac OS, by contrast, can do all of them. Go figure.)

Here are three common characteristics of spam, and how to catch messages with these characteristics:

  1. Embedded images. A lot of spam has embedded images, while not many legitimate messages have images in them.

    Certain types of embedded images can even get you more spam. Instead of attaching an image to display, the spammer can have your email program download the image from their Web site. They can also tailor the link in the message to your email program, so that information about your email address is embedded in the link. Thus, when you open the message and your email program fetches the image from their Web site, they will know that your email address is "live." This makes your email address more valuable, and you will probably get even more spam!

    These images -- called "Web bugs" are frequently one pixel by one pixel and transparent, so you might not even know that they are there.

    To catch messages with embedded images, tell your filters to look for IMG in the body of the message. (IMG is the HTML code for embedding an image.)

    As long as you don't discuss the city of Primghar, Iowa, this should only catch messages with images.

  2. No "real name". Most legitimate messages' From: lines have a "real name", e.g.
    	From: "Mabel Garcia" 
    
    while most spam does not, e.g.
    	From: m298sjk23@flossrecycling.com
    

    To find messages with no "real name", look for messages that have no spaces in the From: line.

    Unfortunately, AOL users never have a "real name", so their From: lines look something like this:

    	From: marblesgarcia@aol.com
    

    Messages from an address @yahoogroups.com -- that many mailing lists have -- also frequently don't have a "real name".

    This isn't deadly -- you just have to make an exception for AOL and yahoogroups addresses.

    If you find that you get a too much legitimate mail from people with no "real name", you might try combining the subject garbage filter (below) with this filter so that the spam filter only catches messages with no real name AND subject garbage.

  3. Garbage at the end of the Subject line. A lot of spam has some random numbers and letters way off on the right of the Subject: line. For example,
    	Subject: Make money fast!!!!               xg723wp
    
    This garbage has two possible uses:
    1. To camoflage the message subject. To block further instances of a reported spam, some ISPs will compute a number called a "checksum" for the spam. (Checksums are like an extreme form of data compression.) They generate a checksum on all incoming messages and compare it to its list of known spam's checksums; if there is a match, they block it. Making each message unique in subtle ways -- like adding a number to the subject line -- changes the checksum so that this kind of automated blocking can't work.
    2. To track the success of different advertising campaigns. A tracking ID can help a company figure out how to make their spam "better" from their point of view (i.e. more likely to be read, responded to, and/or acted upon).
    To find messages with subject garbage, look for Subject: lines with lots of spaces. How many? That depends on how aggressive you want to be. I look for seven spaces.

There is some danger in setting up these filters -- some email programs don't notice spaces in filters, which makes the "real name" and "subject garbage" filters more difficult. Be sure to read the detailed instructions on how to set these filters up in your own email program: