Aggressive Bots

| 2 Comments

These past couple of weeks I've been bombarded with a very pesky spider which appeared seemed to make requests for every single entry in my blog. The weird aspect of this inquiry was the bot seemed to disregard 'robots.txt', that is it never bothered to read the file before running roughshod through my site. Because the IP addresses were from the same octet block, I realized that I had a chance to defend myself. The user-agent appeared to originate from some Linux/Mozilla combination. Before I decided to ban the IP from my site, I figured that some investigative work would be necessary.

The 208.111.xxx block seemed to be connected to Limelight Networks LLC. Hmm, I thought it odd that Limelight would be interested in crawling my site. Doug Kaye and all of the IT Conversations folks didn't need any of my content, as they have one of the largest CDNs in thenation. So, after performing the ubiquitous GOOG search I discovered that others in the blogosphere were having similar issues with these bots.

Grepping through my access and error logs proves that I have successfully blocked the IP.


[Fri Feb 8 06:20:12 2008] [error] [client 208.111.154.15] client denied by server configuration: /home/bkaeg/public_html/blog/archives/000632.html[Fri Feb 8 04:01:42 2008] [error] [client 208.111.154.198] client denied by server configuration: /home/bkaeg/public_html/blog/archives/000522.html
[Fri Feb 8 04:01:43 2008] [error] [client 208.111.154.198] client denied by server configuration: /home/bkaeg/public_html/favicon.ico
[Fri Feb 8 04:11:20 2008] [error] [client 208.111.154.198] client denied by server configuration: /home/bkaeg/public_html/blog/archives/000053.html

Because I am of the inquisitive nature, I decided to take it a step further. A simple traceroute and stealth nmap revealed that it was not LimeLight Networks at all. It was actually Kavam, a company out of Tempe, AZ. When I say these bots were aggressive, I am not exaggerating. It was not uncommon to see 3-4 hits from the same bot within an hour.
I discovered the CEO of Kavam was Randy Adams. So, I called Mr. Adams and chatted with him and he confirmed my suspicions and also the story that I had read about from other bloggers.

Apparently, Kavam is working with same venture capitalists that funded LimeLight, LLC. Kavam, is taking snapshots of every website on the Internet. God Bless em. They wish to challenge the GOOG. The brute force method of getting content is admirable but quite annoying. I suppose it would not have been so bad if they would have simply told people of their intent. Perhaps if it had gone of for just a couple of days... This barrage went on for two weeks.
I have been blogging for roughly five years, so there is quite a bit of content to be archived. I wonder why Fooky never took this gestapo approach to collecting metadata? Anyway, I did ask Mr. Adams to be a guest on AG Speaks, hopefully he will chat with me before the company launches its killer application.

  • AG Speaks Episode 016 - Ed Dunn
  • Customer Service
  • Interroperability rears its ugly head
  • Importance of Loopback Device
  • 2 Comments

    There is no need for a gestapo approach to collecting metadata. We at Fooky.com have no need to bombard servers over and over again.

    What I will say it that it could be a marketing ploy. Think about it - if I had Fooky.com ScorpionBots bombard millions of web sites each day, then thousands of techies who run web sites will investigate and discover who we are. I've decided against that kind of marketing as I likened it to spam-style tactics.

    Please note that Fooky, Inc is only 'web crawling' for a limited engagement as we will move away from this practice altogether. I can discuss this more in detail later if you like..

    Sure, I'd be happy to chat with you in more detail.
    Perfect topic for another edition of AG Speaks. Probably a conversation that we should have had awhile ago.

    Monthly Archives

    Pages

    OpenID accepted here Learn more about OpenID
    Powered by Movable Type 4.25

    About this Entry

    This page contains a single entry by AG published on February 17, 2008 5:42 AM.

    G-Men Upended the Pack , Eye Pats was the previous entry in this blog.

    Intelligent Design (Krusell Case) is the next entry in this blog.

    Find recent content on the main index or look in the archives to find all content.