Wednesday, March 12, 2008

One Problem Google Can't Solve: Spam

get a fair amount of email -- on average between 1,000 to 2,500 emails a day. Most, as you might expect, are spam. (Not fan mail, alas.) As a result of this figure, and of years following the spam industry both as a journalist and as a vexed end user, I've become well-versed with a number of the popular spam-fighting tools out there.

I'm particularly fond of tools that leverage the so-called "wisdom of crowds" to help combat spam: services that judge a message's likelihood of being spam based on the number of recipients who have marked it as such. These approaches have been commercially available for perhaps a decade, and have often been integrated with other antispam approaches (checksum, Baysian -- still my favorite, and so on) at either the client or server level. Many ISPs integrate some form of these kinds of solutions at the server level without end users even knowing.

Not that it makes much difference: Most of us who have been on the Internet for some time knows that inevitably, the spam will find its way into your inbox.

No matter how finely tuned your spam filter may be, no matter how wise your crowd (with apologies to James Surowiecki) your spam filter will one day fail. Perhaps, like mine, and most of ours, it fails every day.

But it shouldn't fail for a lack of connecting the dots. Which is why I am baffled by the fact that Google's Gmail seems to ignore the fact that I receive the same spams for weeks on end. I'm not talking about the same scam (e.g., the nephew of a recently deposed dictator has selected YOU to help him move millions out of his country!) but the same message. Same subject; same intentional misspellings; largely the same body content (within, I'd estimate, 90% of the same bizarre spellings and word choice repeated).

I'm clicking "Report Spam" in each instance -- presumably, this flags each message as spam and ensures that Gmail's servers are more likely to mark them as spam as they're sent to additional users. But it's unclear why I continue to receive the same blasted spam. Am I helping other users avoid similar spam, but not myself? I couldn't be the only user receiving these spams and flagging each with the "Report Spam" button... could I? (Doubtful, considering spammers' businesses depend entirely on bulk mailing. But it does make one wonder.)

Additionally, why is it that Gmail doesn't weed out unfiltered spams from my Inbox that others have flagged -- before I read them? If thousands of other users report a message as spam, and I have that same message sitting unread in my Inbox, why is it that I have to report or delete it myself? Why hasn't that spam still sitting, unread, in my Inbox since 2006 (yes, this is true) been detected and destroyed? Why isn't the wisdom of crowds being leveraged in this fashion?

Sigh. While I'm fantasizing, why can't Google calculate (or expose, if it exists already in secret) a "spam likelihood" ranking for each piece of email? This is a feature I've long appreciated with antispam tools like SpamBayes for Outlook, and further helps me avoid the emails that slip into my inbox that are likely to be spam.

Without knowing Gmail's spam-likelihood ranking for e-mails, I forever wonder whether a low-probability-of-spam message (which might just as likely be a legitimate email) is being treated with the same prejudice as a high-probability missive from the nephew of a recently deposed dictator.

I realize that Google may rely on secrecy to keep the inner workings of its antispam efforts free from the prying eyes of the scammers themselves. And I know the company's taken efforts in recent months to make their efforts more visible to us users. (Also, check this out.) But for all of Google's efforts (which include support for multiple authentication systems like Sender Policy Framework, DomainKeys, and DomainKeys Identified Mail, among a slew of other features) clearly even it hasn't come up with a truly killer idea in spam.

And perhaps, considering how highly we all hold Google in regard, that might be the most disheartening fact of spam yet.

No comments: