Even an enterprise with the harshest, strictest blocking rules in place is likely to leave the door ajar for Google’s search bot software known as a Googlebot.

Googlebots crawl websites collecting data along the way in order to build a searchable index that assures a site will be listed and ranked on the search engine.

Hackers have taken notice of the access afforded to these crawlers and are using spoofed Googlebots to launch application-layer distributed denial of service attacks with greater frequency.

Research released today from web security firm Incapsula identifies this as a growing trend among attackers; for every 25 Googlebot visits, companies are likely to visited by a fake one. Almost a quarter of those phony Googlebots are used in DDoS attacks, elevating it to the third most popular DDoS bot in circulation, according to product evangelist Igal Zeifman.

Zeifman said Incapsula is able to identify Googlebot imposters because Google crawlers come from a pre-determined IP address range. All of the fake ones are considered malicious, and have been used to initiate site scraping, spam and hacking in addition to DDoS.

Zeifman said attackers’ success with this approach is due to a combination of two things.

“One is the assumption you can have indiscriminate protection. Even if you provide harsh blocking rules, and say block all traffic from x country, you still leave some way for Google to get in because you want to appear on Google [search results],” Zeifman said. “Hackers are looking for a loophole. The more advanced [mitigation] tools are able to identify Googlebots, which is done by a cross-verification of IP addresses. But this also shows a low level of understanding by hackers of how modern DDoS protection works. They assume you can’t do IP cross verification.”

While network-layer DDoS attacks have reached ridiculous proportions with volumes of traffic built on amplification attacks reaching upwards of 400 Gb per second of bad traffic, application-layer attacks don’t require nearly the same level of noise. Attackers can scout out a website’s resources and pinpoint attacks, for example, to continually request a download for a particular form hosted on a site, or make requests of other resource-heavy pages. Website designers tend to over-provision for the number of website visitors per second or minute they anticipate, and that is rarely an outrageous number. Therefore it’s simple for an attacker to send more fake Googlebots at a resource than a page can handle.

“You don’t have to create a big flood to generate 5,000 visits per second,” Zeifman said. “It’s easy to generate 5,000 per second. Layer 7 attacks are more common for sure than Layer 3 or 4 events. The reason is that it’s easier to execute and more dangerous, even in low volumes.”

There have been attacks, Zeifman said, where hackers have used both network-layer and application-layer DDoS tactics simultaneously. Some of those attacks, he said, have also figured out how to beat DDoS mitigations in place for the application layer that require the user to activate a JavaScript object in order to weed out browsers from bots.

“Can you execute the JavaScript? If not, then you are a bot posing as a browser,” Zeifman said. “They’ve figured out how to attack these resources to remove this tool from our arsenal.”

Categories: Web Security

Comments (2)

  1. Notme
    2

    The saddest part is that Google itself is to partly blame for this problem. In their efforts to guard their Adsense advertising fortunes, Google permits quite a few ‘ IPs that do not ‘whois’ to a Google name, cruise the web sites to ensure that what the Google Search engine and the Adsense Bot see is the same thing that is being shown to ‘users’. When an attentive network administrator catches these Google sponsored ( but not Google branded ) bots and BANS them, the web site is punished by Google and Adsense for blocking their spying bots.

    Attentive webmasters who closely monitor their server logs will notice that a visit by ‘the genuine Google Bot’, is usuallly followed shortly afterwords by up to ten other IP’s all mysteriously requesting the same files. Basically your content is slurped down, not once but by a grand total of 11 different Bots. And when you check the IP’s of those ‘follower bots’, you will find that they don’t ‘whois’ to anything related to Google or Adsense. So if you BAN those ten content leeching Bots, you will find Google itself or Adsense dolling out punishment to your site by removing the pages, lowering your site ranking or in the case of Adsense, prohibiting you from running Adsense ads.

    Naturally you also have monkeys working from their basement, with visions of being the next Google, who FAKE the Google user agent string, in the vain hope that they will be able to leech down your web site content unimpeded – on the theory that no one would dare block the almighty Google. These jerks have an impact in that their pattern of activity and whois’s to their IP’s yield the same lack of Google confirmation. So you BAN them just like you ban the 10 or so ‘authorised but non-confirmable Google/Adsense spybot IPs.

    Lately this same pattern is showing up with Facebook and Twitter, where their official Bot comes along to confirm or test links posted on their sites, and then magically, shortly thereafter a dozen other Bots immediately show up and also try to scarf down the identical files.

    My initial thought when seeing this pattern with Google and Adsense, was that they were, for a price, giving certain paying customers immediately identification of sites where new content could be found and pillaged.

    I am suspecting that Facebook and Twitter are also using this revenue generating scheme, selling early, exclusive access, to the newest content confirmed by their Bots. If so… it won’t be long before web sites that are having their content pillages by these tag-along Bots, and for whom web BAN, will find Twitter and Facebook handing out exclusions or other forms of punishment to sites that are taking measures to protect their content – and more importantly their bandwidth – from these unauthorized content harvesters.

    Reply

Leave A Comment

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>