On Mon, Jan 09, 2012 at 10:17:32PM +0000, steve-ALUG@hst.me.uk wrote:
On 09/01/12 14:46, Chris Green wrote:
On Mon, Jan 09, 2012 at 02:35:29PM +0000, nev young wrote:
I find that the google webcrawler bot, and a few others, keep hitting a url once they find it even though it may be long gone and now gives a 404. The bots seem unable to take the hint.
You're right! All the recent accesses are from google. I'm sure when I looked a few days ago they weren't but they certainly are now. So I'm even less worried than I was (which wasn't very).
Could it be that Google have noticed that there's nothing there and are scanning it frequently to find when it comes back?
Anyway, you may be able to stop it with a robots.txt file in the root of your website. Personally, I'd think that a robots.txt file is a good idea on any website that has bits you don't want search engines to hit, even if some web-crawlers don't honour it.
Ah, yes, I used to have a robots.txt file, maybe I should reinstate it.