BOSTON–Since virtually the dawn of the commercial Web and the advent of widespread email use, spam has been a major problem and it has grown to a point that botnets are now spewing trillions of spam messages every month. But, email spam is just one piece of a much larger ecosystem that now is mainly dominated by Web-based spam pushing users to malicious, or at best, worthless, pages.
Web-based spam differs from email spam in several fundamental ways, but the goal of the people behind it is the same as that of the botnet operators cramming inboxes with junk messages: dollar dollar bills. Much of the Web spam problem right now is related to the scareware epidemic in one way or another and a lot of the black hat SEO and SEO spam campaigns that occur on a daily basis are being used to push rogue antivirus or scareware products.
“The motivation is money, either syndication or ad-based,” Sasi Parthasarathy of Microsoft’s Bing said in a talk at USENIX LEET here Tuesday
describing the challenges of defending against the spammers.
The scareware gangs have been focusing their energy for a couple of years now on tying their campaigns to major news events such as the earthquake in Japan or the death of Elizabeth Taylor. But they’re not foolish enough to wait for the next big event to get their campaigns ready to go. Instead, they will go out in advance and hijack various pages on legitimate sites and then when an event occurs, they will stuff the page full of relevant keywords and then insert links that will redirect users to a scareware site.
Once on the scareware site, the user could be compromised via drive-by download that will make the machine part of a botnet or simply presented with a demand for payment for the removal of nonexistent threats on the PC.
The techniques and tactics used to create link farms, content spam and search-engine spam are well understood by researchers and engineers at search engine operators, but they’re also quite difficult to combat in real time, he said.
“It’s all about getting authority with the search engine,” Parthasarathy said. “One of the big things we see is these networks of link farm sites that are all related and they just link back and forth to each other to build authority. There’s no valid content, but they have all the right keywords.”
Parthasarathy said that Microsoft uses both algorithmic and human approaches to detecting spam links, link farms, useless pages without valid content and SEO spam. One of the challenges that Bing, Google and other search engine operators face in this work is ensuring that they don’t mistakenly discount legitimate sites with valid content while still weeding out malicious or spammy links and pages. It’s not a simple process, Parthasarathy said.
“It’s very hard for a human judge to determine the relevance of links and identify link-level spam,” he said. “If you’re not careful you can end up harming good sites. It’s a major problem. Page-level spam is very easily identifiable. Link-level spam is super hard to find.”
In addition to setting up networks of interrelated link farms and spam pages, the spammers also will add content spam to sites. This can take the form of text set in white type on a white background that’s invisible to the user but is seen by a search engine crawler or machine-generated content such as keywords taken from search engine query logs. This content will often make no sense and look like gibberish on a page, but serves as an attraction for the crawler.
The bottom line, Parthasarathy said, is that Web-based spam is a massive, distributed problem that needs a large-scale solution.
“Blocking single URLs isn’t a scalable solution,” he said. “We want to aggregate. Neutralizing spam without hurting users is hard.”