31 August, 2007

Out out dam spam

From The New Yorker Annals of Technology
Graham compared every character—dashes, apostrophes, numbers, symbols—in thousands of genuine e-mails with those in thousands of pieces of spam. He was able to train his software to use the context of a message to guess how likely it was that an e-mail containing certain words in relation to each other was spam. The words “republic” and “madam” seem innocent enough, but when they appear together in an e-mail they are often from a Nigerian huckster who has addressed his e-mail “Dear Sir or Madam.” Mail like that is invariably spam.

As filters become more sophisticated, spam becomes more elusive. There are millions of ways to write a word using punctuation, numbers, and other symbols. One mathematically minded blogger who looked into it found that there are 600,426,974,379,824,381,952 ways to spell Viagra. “If I thought that I could keep up current rates of spam filtering, I would consider this problem solved,” Graham wrote. “But it doesn’t mean much to be able to filter out most present-day spam, because spam evolves.” Indeed, most anti-spam techniques so far have been like pesticides that do nothing other than create a more resistant strain of bugs.
Read it all.

Thanks to Craig Newmark for the link. Craig notes,
Bill Gates infamously predicted in 2004 that the problem of spam would be solved "in about two years".
The image of where Bill pulled that one from is rather vivid.

1 comment:

Curious said...

2 yrs.. eh? Lets see! I thought spammers will be given Nobel prize for literature before someone figures out how to beat them..!

Post a Comment

NOTE: By making a post/comment on this blog you agree that you are solely responsible for its content and that you are up to date on the laws of the country you are posting from and that your post/comment abides by them.

To read the rules click here

If you would like to post content on this blog click here