We're in Google!
#1
FlyerTalk Evangelist
Original Poster
Join Date: Feb 1999
Location: Seat 1A, Juice pretty much everywhere, Mucci des Coins Exotiques
Posts: 34,339
We're in Google!
I'm not sure if this is the right spot, but I was shocked today to see a random recent post of mine show up in a Google search. Is someone on Flyertalk actively adding all our threads to Google?
#2
FlyerTalk Evangelist
Join Date: Sep 2000
Posts: 37,486
Originally Posted by stimpy
I'm not sure if this is the right spot, but I was shocked today to see a random recent post of mine show up in a Google search. Is someone on Flyertalk actively adding all our threads to Google?
#3
Join Date: Oct 2002
Location: UK
Posts: 7,560
Originally Posted by ScottC
Google "spiders" pretty much every website, with VBulletin it's probably grabbing much more than it used to on UBB.
#4
Join Date: Jun 2002
Location: SFO/JFK/DRO
Posts: 275
Easy enough to do
There's a standard method of excluding search engines from all or part of websites though the use of a file named "robots.txt." In FT's case, I think this might be a good idea, at least for certain parts of the site (e.g., CC) that might attract the "wrong sort" of people/attention from Google surfers...
See http://www.google.com/webmasters/3.html#B3
See http://www.google.com/webmasters/3.html#B3
. I don't want Google to crawl part or all of my site.
There is a standard method involving a "robots.txt" file for excluding robot crawlers. This will prevent Googlebot or other crawlers from visiting your site. Googlebot has a user-agent of "Googlebot". In addition, Googlebot understands some extensions to the robots.txt standard: Disallow patterns may include * to match any sequence of characters, and patterns may end in $ to indicate that the $ must match the end of a name. For example, to prevent Googlebot from crawling files that end in gif, you may use the following robots.txt entry:
User-agent: Googlebot
Disallow: /*.gif$
There is another standard for telling robots not to index a particular web page or follow links on it, which may be more helpful, since it can be used on a page-by-page basis. This method involves placing a "META" element into a page of HTML.
Remember, changing your server's robots.txt file or changing the "META" elements on its pages will not cause an immediate change in what results Google returns. It is likely that it will take a while for any changes you make to propagate to Google's next index of the web.
There is a standard method involving a "robots.txt" file for excluding robot crawlers. This will prevent Googlebot or other crawlers from visiting your site. Googlebot has a user-agent of "Googlebot". In addition, Googlebot understands some extensions to the robots.txt standard: Disallow patterns may include * to match any sequence of characters, and patterns may end in $ to indicate that the $ must match the end of a name. For example, to prevent Googlebot from crawling files that end in gif, you may use the following robots.txt entry:
User-agent: Googlebot
Disallow: /*.gif$
There is another standard for telling robots not to index a particular web page or follow links on it, which may be more helpful, since it can be used on a page-by-page basis. This method involves placing a "META" element into a page of HTML.
Remember, changing your server's robots.txt file or changing the "META" elements on its pages will not cause an immediate change in what results Google returns. It is likely that it will take a while for any changes you make to propagate to Google's next index of the web.
#6
FlyerTalk Evangelist
Join Date: Jun 2000
Location: Sunny SYDNEY!
Programs: UA Million Miler. (1.9M) Virgin Platinum. HH Diamond + SPG Gold
Posts: 32,330
Seems to me that that our posts were always spidered by Google?
Type any user name into Google and you'll find matches even from the old FT AFAIK.
Type any user name into Google and you'll find matches even from the old FT AFAIK.
#7
FlyerTalk Evangelist
Join Date: Sep 2000
Posts: 37,486
Originally Posted by ozstamps
Seems to me that that our posts were always spidered by Google?
Type any user name into Google and you'll find matches even from the old FT AFAIK.
Type any user name into Google and you'll find matches even from the old FT AFAIK.
#8
Flyertalk Evangelist and Moderator: Coupon Connection and Travel Products
Join Date: Jul 2000
Location: Milton, GA USA
Programs: Hilton Diamond, IHG Platinum Elite, Hyatt Discoverist, Radisson Elite
Posts: 19,040
To get this thread to relevance for this forum:
And just what is the TALKBOARD doing about this google relevation?
William
And just what is the TALKBOARD doing about this google relevation?
William
#9
FlyerTalk Evangelist
Original Poster
Join Date: Feb 1999
Location: Seat 1A, Juice pretty much everywhere, Mucci des Coins Exotiques
Posts: 34,339
One thing I think Flyertalk can do is tell everyone (perhaps via Talkmail?) what is going on. I'm sure many of us don't realize that our posts are being exposed in such a way to the outside world. Yes anyone can read Flyertalk, but that takes effort. As we know Google allows people to easily grab the most arcane and buried information. Just a warning to think about Google before you post certain info might be good.
#10
Join Date: Nov 2002
Location: CH / D
Programs: Amex, Avis, BA, BD, CX, FS, Hertz, HH, IC, LH, NH, RC, RCCL, Sixt, SPG, SQ, UA
Posts: 7,050
I got to FT when searching in google a hotel name. I forgot which property, but it made sure my attention got to FT. I am kind of happy and mad about it...all those ours lurking and posting, butI enjoy doing so on the other hand...
^ ^
^ ^
#12
Join Date: Jun 2004
Location: www.percussionking.com
Programs: UA Alluminum =-D
Posts: 173
TechTV had an article on this, but they're going through some major changes so I hesitate to post a link; it may soon be invalid.
You have to put robots.txt on each server at the root directory for the website. The one for my webpage looks something like this:
---Start of file---
# robots.txt for http://www.mywebpage.com/
# Disallow: /folder_name/
User-agent: *
Disallow: /
---End of file---
This will prevent all search engines using this method from looking in any of my folders. The first two lines are completely unnecessary; they are just instructions for the person writing the file.
You have to put robots.txt on each server at the root directory for the website. The one for my webpage looks something like this:
---Start of file---
# robots.txt for http://www.mywebpage.com/
# Disallow: /folder_name/
User-agent: *
Disallow: /
---End of file---
This will prevent all search engines using this method from looking in any of my folders. The first two lines are completely unnecessary; they are just instructions for the person writing the file.
#13
Moderator: Coupon Connection & S.P.A.M
Join Date: May 2000
Location: Louisville, KY
Programs: Destination Unknown, TSA Disparager Diamond (LTDD)
Posts: 57,952
Originally Posted by wharvey
To get this thread to relevance for this forum:
And just what is the TALKBOARD doing about this google relevation?
William
And just what is the TALKBOARD doing about this google relevation?
William
Off to Technical Forum with this topic!
#14
Join Date: Oct 2001
Location: BOS
Programs: AA LTG EXP, HH Diamond
Posts: 3,419
I mentioned this to Randy a while ago, that we didn't used to apear in the search engines but after the switch over it apears the default setting went from no robots to letting them crawl the site. I think I would feel alot better if at a minimum they didn't indext OMNI and probably a number of other boards, such as the the rental car forums that tend to be all about codes etc and trip reports and itineraries where people tend to put out information that they dont really want to share with the whole wide world.
Maybe its just me, but I think it would be a good idea if most of the site did not apear in the search engines.
Maybe its just me, but I think it would be a good idea if most of the site did not apear in the search engines.
#15
Join Date: Nov 2002
Location: CH / D
Programs: Amex, Avis, BA, BD, CX, FS, Hertz, HH, IC, LH, NH, RC, RCCL, Sixt, SPG, SQ, UA
Posts: 7,050
Maybe its just me, but I think it would be a good idea if most of the site did not appear in the search engines.
Exactly my thought. There is a lot of personal information disclosed here that we do share with our FT friends but prefer not to share with the airlines, rental car companies, etc.
-Links to Targeted Promotions,
-CDWs and AWDs and other Discount codes
-Information about flights taken and description of service attendants
-Information about personal habits (e.g. in OMNI!)
-Links to photos of my family as part of trip reports
-Photos of many FTers can be access thru this site as well as personal websites and pages with the real names and not just the FT handle
Please make sure that in searches the mentioned above cannot be found and matched to the e-mail address given for FT registration purposes.
Exactly my thought. There is a lot of personal information disclosed here that we do share with our FT friends but prefer not to share with the airlines, rental car companies, etc.
-Links to Targeted Promotions,
-CDWs and AWDs and other Discount codes
-Information about flights taken and description of service attendants
-Information about personal habits (e.g. in OMNI!)
-Links to photos of my family as part of trip reports
-Photos of many FTers can be access thru this site as well as personal websites and pages with the real names and not just the FT handle
Please make sure that in searches the mentioned above cannot be found and matched to the e-mail address given for FT registration purposes.