Email Guides and Essays
by Kaitlin Duck Sherwood,
including
Top
Ten Tips for Overcoming Email Overload
About Overcome Email Overload with Eudora 5
About Overcome Email Overload with
Microsoft Outlook 2000 and Outlook 2002
Frequently asked questions
Press room
About Kaitlin Duck Sherwood
World Wide Webfoot Press home
|
|
My Top Three Spam Filters
by Kaitlin Duck Sherwood
Note: Since I wrote this, I have noticed an amazing amount of activity
in the area of anti-spam software. The second
generation of anti-spam tools is much, much, MUCH
more accurate than the previous generation of anti-spam tools. The latest
crop of tools should do a much better job than you can do yourself.
Read more in my essay,
Second Generation Anti-Spam Solutions.
I have found that about three-quarters of my spam has at least one of the
three main characteristics. And, if I set up a "whitelist" -- a list of
people whose mail I always want to see -- I get almost no false hits.
Note: At the end of this page, I have links to implementation details for
several of the most popular email programs. Alas, not all email programs
are capable of implementing all three filters. Outlook Express for Windows, in
fact, can't do any of them except whitelisting. (Outlook Express for
Mac OS, by contrast, can do all of them. Go figure.)
Here are three common characteristics of spam, and how to catch messages
with these characteristics:
- Embedded images. A lot of spam has embedded images, while not
many legitimate messages have images in them.
Certain types of embedded images can even get you more spam.
Instead of attaching an image to display, the spammer can have your
email program download the image from their Web site. They can also
tailor the link in the message to your email program, so that information
about your email address is embedded in the link. Thus, when you
open the message and your email program fetches the image from their
Web site, they will know that your email address is "live." This makes
your email address more valuable, and you will probably get even more
spam!
These images -- called "Web bugs" are frequently one pixel by one pixel and
transparent, so you might not even know that they are there.
To catch messages with embedded images, tell your filters to look for
IMG
in the body of the message. (IMG is the HTML code for
embedding an image.)
As long as you don't discuss the city of Primghar, Iowa, this
should only catch messages with images.
- No "real name". Most legitimate messages' From: lines
have a "real name", e.g.
From: "Mabel Garcia"
while most spam does not, e.g.
From: m298sjk23@flossrecycling.com
To find messages with no "real name", look for messages that have no spaces
in the From: line.
Unfortunately, AOL users never have a "real name", so their From: lines
look something like this:
From: marblesgarcia@aol.com
Messages from an address @yahoogroups.com -- that many mailing
lists have -- also frequently don't have a "real name".
This isn't deadly -- you just have to make an exception for AOL and
yahoogroups addresses.
If you find that you get a too much legitimate mail from people with no "real
name", you might try combining the subject garbage filter (below) with this
filter so that the spam filter only catches messages with no real name AND
subject garbage.
- Garbage at the end of the Subject line. A lot of
spam has some random numbers and letters way off on the right of the
Subject: line. For example,
Subject: Make money fast!!!! xg723wp
This garbage has two possible uses:
- To camoflage the message subject. To block further instances of
a reported spam, some ISPs will compute a number called a "checksum" for
the spam. (Checksums are like an extreme form of data compression.)
They generate a checksum on all incoming messages and compare it to its
list of known spam's checksums; if there is a match, they block it.
Making each message unique in subtle ways -- like adding a number to the
subject line -- changes the checksum so that this kind of automated blocking
can't work.
- To track the success of different advertising campaigns. A tracking
ID can help a company figure out how to make their spam "better" from their
point of view (i.e. more likely to be read, responded to, and/or acted upon).
To find
messages with subject garbage, look for Subject: lines with lots
of spaces. How many? That depends on how aggressive you want to be. I
look for seven spaces.
There is some danger in setting up these filters -- some email programs
don't notice spaces in filters, which makes the "real name" and "subject
garbage"
filters more difficult.
Be sure to read the detailed instructions on how to
set these filters up in your own email program:
|
|