Aggressive Bots


These past couple of weeks I've been bombarded with a very pesky spider which appeared seemed to make requests for every single entry in my blog. The weird aspect of this inquiry was the bot seemed to disregard 'robots.txt', that is it never bothered to read the file before running roughshod through my site. Because the IP addresses were from the same octet block, I realized that I had a chance to defend myself. The user-agent appeared to originate from some Linux/Mozilla combination. Before I decided to ban the IP from my site, I figured that some investigative work would be necessary.

The block seemed to be connected to Limelight Networks LLC. Hmm, I thought it odd that Limelight would be interested in crawling my site. Doug Kaye and all of the IT Conversations folks didn't need any of my content, as they have one of the largest CDNs in thenation. So, after performing the ubiquitous GOOG search I discovered that others in the blogosphere were having similar issues with these bots.

Grepping through my access and error logs proves that I have successfully blocked the IP.

[Fri Feb 8 06:20:12 2008] [error] [client] client denied by server configuration: /home/bkaeg/public_html/blog/archives/000632.html[Fri Feb 8 04:01:42 2008] [error] [client] client denied by server configuration: /home/bkaeg/public_html/blog/archives/000522.html
[Fri Feb 8 04:01:43 2008] [error] [client] client denied by server configuration: /home/bkaeg/public_html/favicon.ico
[Fri Feb 8 04:11:20 2008] [error] [client] client denied by server configuration: /home/bkaeg/public_html/blog/archives/000053.html

Because I am of the inquisitive nature, I decided to take it a step further. A simple traceroute and stealth nmap revealed that it was not LimeLight Networks at all. It was actually Kavam, a company out of Tempe, AZ. When I say these bots were aggressive, I am not exaggerating. It was not uncommon to see 3-4 hits from the same bot within an hour.
I discovered the CEO of Kavam was Randy Adams. So, I called Mr. Adams and chatted with him and he confirmed my suspicions and also the story that I had read about from other bloggers.

Apparently, Kavam is working with same venture capitalists that funded LimeLight, LLC. Kavam, is taking snapshots of every website on the Internet. God Bless em. They wish to challenge the GOOG. The brute force method of getting content is admirable but quite annoying. I suppose it would not have been so bad if they would have simply told people of their intent. Perhaps if it had gone of for just a couple of days... This barrage went on for two weeks.
I have been blogging for roughly five years, so there is quite a bit of content to be archived. I wonder why Fooky never took this gestapo approach to collecting metadata? Anyway, I did ask Mr. Adams to be a guest on AG Speaks, hopefully he will chat with me before the company launches its killer application.

  • AG Speaks Episode 016 - Ed Dunn
  • Customer Service
  • Interroperability rears its ugly head
  • Importance of Loopback Device
  • Monthly Archives


    OpenID accepted here Learn more about OpenID
    Powered by Movable Type 4.25

    About this Entry

    This page contains a single entry by AG published on February 17, 2008 5:42 AM.

    G-Men Upended the Pack , Eye Pats was the previous entry in this blog.

    Intelligent Design (Krusell Case) is the next entry in this blog.

    Find recent content on the main index or look in the archives to find all content.