Eric Stewart: Running Off At The Mouth

AOL – Who do they think they are?

by Eric Stewart on Mar.15, 2007, under Internet Service Providers, Technology, The Internet

So, if you run your own web site, or if you are site admin, or are someone worried about statistics, or more specifically, what search engines are doing to your site, here’s a little tidbit of info for you.

If you’ve been doing it for a while, you’re aware of “robots.txt” – a file you can put in your server root that is supposed to tell search engines what directories they aren’t supposed to sweep and index. For bandwidth and copyright purposes, I have my video and images directories set up as exclusions in my robots.txt.

So I was looking at my stats the other day and noticed that some of my top visitors, and definitely my top bandwidth users, were all from one domain. When it resolved, it was *.search.aol.com. And after looking at the logs, it became apparent that the *.search.aol.com robot was sweeping my images and video.

I wasn’t so much happy about that.

So, from one admin to another, I suggest you go through your robots.txt file and set up the web server to deny access to your restricted directories to .search.aol.com (and, since you’re doing it, throw in the other search engines like “.inktomi.com” and “.googlebot.com”).

You shouldn’t have to do this. But a major site indexer is indexing stuff it’s been told not to.

:,

Hi! Did you get all the way down here and not find an answer to your question? The two preferred options for contacting me are:
  • Twitter: Just start your Twitter message with @BotFodder and I'll respond to it when I see it.
  • Reply to the post: Register (if you haven't already) on the site, submit your question as a comment to the blog post, and I'll reply as a comment.

Leave a Reply

You must be logged in to post a comment.