Spam poetry

Courtesy of The Register, this is a marvellous collection of spam poetry – little bits of spam emails that seem to have a deeper meaning. My personal favourite is the lovely phrase “Translucent gibbon rucksack bonanza”.

All this is because when Bayesian filtering started becoming popular, spam mail generators starting writing code to generate nonsense that looked like English. This was done using a Markov chain nonsense generation system – you simply look through a piece of text, and analyse which words follow which other words. For example, in this article the word “this” is followed by the words “is”, “was” and “article”. Once you’ve done that, pick a word at random to start (eg. “This”), then pick one of the next options (eg “is”), and repeat until bored. A sample bit of text from this article could read:

“Done using a word at random to have a sample bit of spam mail generators starting writing code to have a sample bit of text from this is a marvellous….”

Not perfect sense, but it’s difficult to automatically distinguish from English. Despite this, the war against spam continues on new fronts, with Sender Policy Framework trying to ensure users only send mail as themselves (preventing spoofing of “From” addresses) – although this won’t prevent all spam, even when Microsoft adopts it. International agreements to do… well, something about the problem are being forged (although as this only includes the United States, United Kingdom, and Australia it’s unlikely to be terribly effective).

No doubt the fight will continue, although probably at some cost to both freedom of speech and the plain simple convenience of email.

Comments are closed.