Gmail Fires Back in the War on Spam

In the competition between top Web-based e-mail providers, spam filtering has emerged as a crucial battlefield.

After a Microsoft-sponsored bake-off that put its Hotmail service a hair ahead of Google’s Gmail in spam blocking, Google has stepped forward with its own statistics demonstrating that Gmail is the antispam leader.

Google told Gadgetwise that its internal data indicated that less than 1 percent of the e-mail in the average Gmail inbox was spam, and that its rate of “false positives,” or messages falsely tossed as spam, was also less than 1 percent. Microsoft said in February that less than 3 percent of e-mail in the average Hotmail inbox was spam. It declined to provide a false-positive rate.

Google says its lead is even wider because Gmail’s definition of spam is more stringent than Hotmail’s. Microsoft’s figures measure unsolicited commercial e-mail, while Gmail’s figures include both unsolicited and solicited mail that users regard as spam, or so-called “gray mail.”

Gmail’s spam fight has been shaped from the start by a belief that the best definition of spam is “what our users say it is,” said Bradley Taylor, an early architect of Gmail’s antispam system. As such, to tune its spam filters, Gmail has relied heavily on user reports of junk messages, which are made by clicking the box next to the offending communication and then clicking ‘Report Spam. It has not given a free pass to annoying e-mails from legitimate senders because users have given them their addresses.

“Besides being really good at getting rid of spam,” Gmail’s definition and filtering approach “got rid of unwanted mail,” Mr. Taylor said.

Gmail prides itself on its champion spam-fighting abilities. The service was born in 2004 in part out of the notion that Google could eliminate spam better than other available services, which were cluttered with the stuff at the time. Since then, the whole industry has made major strides to battle the problem.

Gmail’s filters lean heavily on sender reputation, generously informed by user spam reports, an approach that enables Gmail to offer both systemwide and personalized spam filters to its 350 million active users. Its broad definition of spam has been controversial, particularly with e-mail senders. But they have adapted, Mr. Taylor said, and are more careful to send messages people want to receive as a result. To help establish reputation, Gmail also uses several technical tools to “authenticate” sender identities and block spammers and phishers who “spoof” or forge their addresses.

Hotmail also uses authentication technologies and other tools to assess sender reputation and block messages from I.P. addresses with bad reports. While it collects and uses spam reports from its users, it does not rely on them as heavily as Gmail does to avoid blocking commercial messages that some people want. To help users manage gray mail, Hotmail offers several tools, including a special newsletter filter, a one-click unsubscribe feature and a scheduled cleanup tool that will remove old e-mails from a given sender — a daily deals site, for example. (Gmail also has unsubscribe and filtering features.)

Pradeep Kyasanur, who heads Gmail’s spam team, says Gmail is focused on keeping false positives low, since missing legitimate messages is a big irritation for users, especially business users. The large numbers of spam reports — and not-spam reports — it receives helps it separate the two at a high level of accuracy, he said.

“It’s definitely a complex problem and occasionally we get things wrong,” Mr. Kyasanur said, “But on balance I think it helps the users and helps our system to get it right.”

To encourage voluminous correct reports from users and keep them safe from malware and phishing, Gmail has tuned its communication with users about e-mail risks and how its antispam filters work. Last month, it began displaying a brief explanation on messages labeled spam about why it was blocked and giving users an opportunity to correct the message classification. Reasons include: “It might contain a virus or malicious link,” “It’s similar to messages that were detected by our spam filters,” and “You previously marked messages from [address] as spam.”