On 09/01/12 22:17, steve-ALUG@hst.me.uk wrote:
On 09/01/12 14:46, Chris Green wrote:
On Mon, Jan 09, 2012 at 02:35:29PM +0000, nev young wrote:
I find that the google webcrawler bot, and a few others, keep hitting a url once they find it even though it may be long gone and now gives a 404. The bots seem unable to take the hint.
You're right! All the recent accesses are from google. I'm sure when I looked a few days ago they weren't but they certainly are now. So I'm even less worried than I was (which wasn't very).
Could it be that Google have noticed that there's nothing there and are scanning it frequently to find when it comes back?
Could be but after a few months you'd think they'd give up. Google can be told to stop crawling dead pages via their webmaster tools pages. Although I've given up doing that.
Anyway, you may be able to stop it with a robots.txt file in the root of your website. Personally, I'd think that a robots.txt file is a good idea on any website that has bits you don't want search engines to hit, even if some web-crawlers don't honour it.
robots.txt is a two edged sword.
Good bots keep out when told. Bad bots ignore it and enter anyway. Blackhats are alerted that you have a page you don't want them to see.
(hmmm maybe that's a 3 edged sword).