![]() |
We're in Google!
I'm not sure if this is the right spot, but I was shocked today to see a random recent post of mine show up in a Google search. Is someone on Flyertalk actively adding all our threads to Google?
|
Originally Posted by stimpy
I'm not sure if this is the right spot, but I was shocked today to see a random recent post of mine show up in a Google search. Is someone on Flyertalk actively adding all our threads to Google?
|
Originally Posted by ScottC
Google "spiders" pretty much every website, with VBulletin it's probably grabbing much more than it used to on UBB.
|
Easy enough to do
There's a standard method of excluding search engines from all or part of websites though the use of a file named "robots.txt." In FT's case, I think this might be a good idea, at least for certain parts of the site (e.g., CC) that might attract the "wrong sort" of people/attention from Google surfers...
See http://www.google.com/webmasters/3.html#B3 . I don't want Google to crawl part or all of my site. There is a standard method involving a "robots.txt" file for excluding robot crawlers. This will prevent Googlebot or other crawlers from visiting your site. Googlebot has a user-agent of "Googlebot". In addition, Googlebot understands some extensions to the robots.txt standard: Disallow patterns may include * to match any sequence of characters, and patterns may end in $ to indicate that the $ must match the end of a name. For example, to prevent Googlebot from crawling files that end in gif, you may use the following robots.txt entry: User-agent: Googlebot Disallow: /*.gif$ There is another standard for telling robots not to index a particular web page or follow links on it, which may be more helpful, since it can be used on a page-by-page basis. This method involves placing a "META" element into a page of HTML. Remember, changing your server's robots.txt file or changing the "META" elements on its pages will not cause an immediate change in what results Google returns. It is likely that it will take a while for any changes you make to propagate to Google's next index of the web. |
Why would FT want to do that?
Kinda defeats the point, doesn't it? |
Seems to me that that our posts were always spidered by Google?
Type any user name into Google and you'll find matches even from the old FT AFAIK. |
Originally Posted by ozstamps
Seems to me that that our posts were always spidered by Google?
Type any user name into Google and you'll find matches even from the old FT AFAIK. |
To get this thread to relevance for this forum:
And just what is the TALKBOARD doing about this google relevation? :) William |
One thing I think Flyertalk can do is tell everyone (perhaps via Talkmail?) what is going on. I'm sure many of us don't realize that our posts are being exposed in such a way to the outside world. Yes anyone can read Flyertalk, but that takes effort. As we know Google allows people to easily grab the most arcane and buried information. Just a warning to think about Google before you post certain info might be good.
|
I got to FT when searching in google a hotel name. I forgot which property, but it made sure my attention got to FT. I am kind of happy and mad about it...all those ours lurking and posting, butI enjoy doing so on the other hand...
^ :) ^ |
I just typed in my user name and got 156 matches and they were all from FT
|
TechTV had an article on this, but they're going through some major changes so I hesitate to post a link; it may soon be invalid.
You have to put robots.txt on each server at the root directory for the website. The one for my webpage looks something like this: ---Start of file--- # robots.txt for http://www.mywebpage.com/ # Disallow: /folder_name/ User-agent: * Disallow: / ---End of file--- This will prevent all search engines using this method from looking in any of my folders. The first two lines are completely unnecessary; they are just instructions for the person writing the file. |
Originally Posted by wharvey
To get this thread to relevance for this forum:
And just what is the TALKBOARD doing about this google relevation? :) William Off to Technical Forum with this topic! :) |
I mentioned this to Randy a while ago, that we didn't used to apear in the search engines but after the switch over it apears the default setting went from no robots to letting them crawl the site. I think I would feel alot better if at a minimum they didn't indext OMNI :D and probably a number of other boards, such as the the rental car forums that tend to be all about codes etc and trip reports and itineraries where people tend to put out information that they dont really want to share with the whole wide world.
Maybe its just me, but I think it would be a good idea if most of the site did not apear in the search engines. |
Maybe its just me, but I think it would be a good idea if most of the site did not appear in the search engines.
Exactly my thought. There is a lot of personal information disclosed here that we do share with our FT friends but prefer not to share with the airlines, rental car companies, etc. -Links to Targeted Promotions, -CDWs and AWDs and other Discount codes -Information about flights taken and description of service attendants -Information about personal habits (e.g. in OMNI!) -Links to photos of my family as part of trip reports -Photos of many FTers can be access thru this site as well as personal websites and pages with the real names and not just the FT handle Please make sure that in searches the mentioned above cannot be found and matched to the e-mail address given for FT registration purposes. |
Originally Posted by flamboyant 1
Please make sure that in searches the mentioned above cannot be found and matched to the e-mail address given for FT registration purposes.
|
Some people were horrified when deja.com (since bought by google) started archiving usenet postings and making them searchable on the web. They thought their usenet postings would disappear forever when the posts dropped off the news servers because of age.
More people were horrified when google started indexing pdf, Lotus, Excel, PowerPoint, Word, rtf and other formats. Some companies had put payroll info, etc on the web in those formats, in the belief that no one would know they were there. I apologize for stating the obvious, but even if there is a robots.txt exclusion for the forums (and the powers-that-be may not want to do this because of the number of people who become new flyertalkers by seeing a forum thread in a google search result), the public can still read these forums. As I write this, 315 people have viewed this thread alone. Nobody knows who they are. |
Usenet has always been intentionally broadcast to the world. You always knew that the whole world could easily read your posts without having to do any kind of extra work. It was sent to them directly. And people have been archiving Usenet since the 80's. So while I have Usenet posts from over 10 years ago that are accessible by Google, I don't mind it. However Flyertalk is not the same. We don't send Flyertalk feeds to every internet site in the world. It's our little club. And as I said earlier, surfing to flyertalk.com and poking around isn't the same as plugging in Hotel XXX sucks into Google and finding some hate thread of ours.
|
stimpy you idiot........
yeah what he said........ FT is not usenet, FT is not "the big wide world" there is a difference between it being "possible" for someone to find your post about how you are going to be out of town for 2 months of a RTW trip with eventual cross refrences to pictures of you and a good guess where you live. and it being just a goodle search away, a big difference.
|
When I run my handle through google, I get about 6 links and the following message:
Originally Posted by google
In order to show you the most relevant results, we have omitted some entries very similar to the 6 already displayed.
|
So temptation gets the better of me and I check my handle. What do I get but 2 links to FT and another 377 totally unrelated links :eek:
|
Originally Posted by Kiwi Flyer
So temptation gets the better of me and I check my handle. What do I get but 2 links to FT and another 377 totally unrelated links :eek:
More results from www.flyertalk.com Click that link and you see: Results 1 - 10 of about 156 from www.flyertalk.com for "Kiwi Flyer" |
Google indexing the site is a good thing.
We're not likely to block the largest and best search engine. |
Originally Posted by John at Webflyer
Google indexing the site is a good thing.
We're not likely to block the largest and best search engine. Seriously any chance we can have OMNI, Itinerary and the CommunityBuzz! boards blocked from the search engines? I don't think they ad anything to advancing FTs goals and I think they tend to contain personal info that while we have chosen to share with FT, I don't think anyone really wants to see in google. Well what about it? |
Fascinating stuff, though...
Got 25 hits on my user name. Some were blasts from the past. Some gave me a "this page cannot be displayed" error, and some gave me a bunch of blue FT borders repeated down the screen with no content. I have no problem with the indexing, just find it interesting as to what's captured and what's not. ^ |
I share the hesitancy of having all posts "exposed" to the world. But they already are, just not so easily found. Perhaps the next time we are sent our monthly FT email, we can be apprised of the opening of the gates?
On the other hand, the search function in Google is so much better than the current search function -- faster, doesn't have trouble with single or double letters, easy to look for phrases and exclude items -- that I wonder if we can have Google be our own search engine. |
Google is a way of life... Learn it's positives. Type in your phone number...if your in "the book" your in Google...complete with address.
As for FT, I view this as a good thing, potentially bringing more people to our second home. If a lone-surfer types in "United Award Ticket Info" and gets directed here, I believe thats a good thing. Bottom Line: Flyertalk IS the "outside world". If you dont want it archived for life...dont post it. |
| All times are GMT -6. The time now is 4:48 am. |
This site is owned, operated, and maintained by MH Sub I, LLC dba Internet Brands. Copyright © 2026 MH Sub I, LLC dba Internet Brands. All rights reserved. Designated trademarks are the property of their respective owners.