Travel Technology - Robots.txt - usage on a personal site?




bowdenj
Nov 29, 08, 8:20 pm
Do you folks utilize a robots.txt file?

Lately I've noticed live.com searches and then after live.com did a search suddenly I started seeing a ton of unfamiliar robots from unfamiliar locations.

Do you selectively let some robots search your site (i.e. google?).


sbm12
Nov 29, 08, 8:59 pm
I have nothing to hide. If I did, it wouldn't be on a website. ;)

Are you worried that the content is indexed now??

bdjohns1
Nov 29, 08, 10:02 pm
My personal website is pretty lightly trafficed. When it's all said and done, Google and other crawlers are about 1/3rd of my traffic.


callie-girl
Nov 30, 08, 6:55 am
Yes, I use it. Started because my company site had proprietary information from my clients, kept it because I liked the idea of all my pages being private. With the robots.txt file, I don't even have to worry if I put my kids/grandkids photos up for the family to share.

It became part of any contract for a website (early days of the company) that the discussion of the robots.txt file was mandatory - so that each client knew what it was and why it was good or bad for their site.

Now we only handle one other site outside our own group and they don't have one but it's a restaurant.

sbm12
Nov 30, 08, 8:35 am
Yes, I use it. Started because my company site had proprietary information from my clients, kept it because I liked the idea of all my pages being private. With the robots.txt file, I don't even have to worry if I put my kids/grandkids photos up for the family to share.

Except that a robots.txt flight doesn't really guarantee that. If you've got information "out there" you should assume that it is NOT private unless appropriately protected by an authentication mechanism. Indexing engines do not have to respect the robots.txt file if they don't want to. Plus, if someone direct links in to the "private" site it may get indexed anyways.

UAVirgin
Nov 30, 08, 1:55 pm
Robots.txt only keeps well behaved search bots from indexing files or directories on you web site. It will not secure anything from poorly behaved bots or people. If you don't want someone to see something you'll need to take appropriate steps to secure the content.

alanh
Nov 30, 08, 8:14 pm
Robots.txt is to keep robots from overloading your site. The idea is that you can keep them out of directories that might heavily load your site -- for example, dynamically generated pages that link to more dynamically generated pages. You can also totally exclude them if you don't want to spend the bandwidth being indexed.

In no way is it a privacy measure -- quite the opposite. A "Disallow: /ReallySecretStuffNobodyShouldSee/" will keep Google out, but it's a flag for someone malicious that there's something interesting there.

sdm1130
Nov 30, 08, 10:53 pm
[sdm2@absolut ~]$ more /home/www/html/robots.txt
User-agent: *
Disallow: /


I do use robots.txt. While I don't have anything on my personal domain that I need to hide, there is also not anything on there worth indexing. :)

Hartmann
Dec 1, 08, 7:55 am
Robots.txt is to keep robots from overloading your site. The idea is that you can keep them out of directories that might heavily load your site -- for example, dynamically generated pages that link to more dynamically generated pages. You can also totally exclude them if you don't want to spend the bandwidth being indexed.

In no way is it a privacy measure -- quite the opposite. A "Disallow: /ReallySecretStuffNobodyShouldSee/" will keep Google out, but it's a flag for someone malicious that there's something interesting there.

Robots.txt only keeps well behaved search bots from indexing files or directories on you web site. It will not secure anything from poorly behaved bots or people. If you don't want someone to see something you'll need to take appropriate steps to secure the content.

These two posts are on the mark. The robots.txt file is only for stuff you would not like indexed by the big search engines (old or archived information, etc.) and not for hiding things.

Since the file is readable by any visitor to your site, it can be used to find the files you are actually trying to keep out of view.

If you don't want your files found, don't put them on the internet or put them in a secured directory.



SEO by vBSEO 3.2.0