WhatsApp turned on end-to-end encryption by default last April for its more than 1 billion users. If WhatsApp couldn’t read the contents of its users’ messages anymore, how would it detect and fight spam on the platform? WhatsApp could have become a haven for scammers pushing pills and get-rich-quick schemes, which would have driven users off the platform and harmed its business even more than short-term court-ordered shutdowns.
Instead, WhatsApp developed approaches to detecting spam that don’t rely on content at all, says WhatsApp engineer Matt Jones. Instead of looking at message content, WhatsApp analyzes behavior for indications that a user might be spamming. The approach is working surprisingly well. Jones says that WhatsApp slashed spam by 75 percent after launching end-to-end encryption.
Some of WhatsApp’s behavioral detection systems will sound familiar to anti-spam experts. For instance, WhatsApp looks at how many messages a user is sending and will flag as spam if the user is sending an unusually high number of messages per minute, a common anti-spam strategy. But WhatsApp also uses a number of other signals to determine the probability that a message contains spam.
WhatsApp examines data related to the internet service provider (ISP), the phone number, and the phone network being used, and compares that to previous spam reports. If the ISP data or the phone prefix (the first several digits of a phone number) have been previously associated with spammers, it’s likely that messages associated with that data are still spam. WhatsApp will also take notice if, for example, a phone with a Canadian country code connects via a cell network in Thailand and assess the probability that the user is a spammer or a traveller on vacation.
Once a spammer is reported, WhatsApp will also go back and look at the spammer’s actions on the platform for clues about why he wasn’t caught, then feed that information into its model. “Every message they sent before was an opportunity to prevent spam that we failed to take,” Jones said.
WhatsApp bans users based on these probabilistic models, and if the company makes a mistake, users can appeal the ban. Jones said that WhatsApp has also cut back on mistaken bans through its enhanced spam detection.
However, this approach relies heavily on the analysis of metadata (the non-content information associated with transmitting a message), and WhatsApp has been criticized for hanging on to users’ metadata and sharing it with Facebook. End-to-end encryption only guarantees the privacy of message content, not metadata, but many non-technical users might not understand the difference and may be surprised to learn how WhatsApp collects and analyzes their information.
Open Whisper Systems, the maker of the encrypted chat app Signal and the Signal Protocol (on which WhatsApp’s encryption is based) recently released its first subpoena and its response. The documents showed that OWS doesn’t keep metadata on its users — all that the company could hand over was the account creation date and the last log-in time.
Harvesting metadata is a trade-off. As OWS grows, it may find itself struggling with a spam problem. And WhatsApp will have to balance users’ expectations of privacy with their demand for a spam-free experience.