favoritearticlesinc.com favoritearticlesinc.com
   Index >> About Us >> Privacy >> Terms of Use >> Add Url >> Submit Article
Search:   
Free links exchange
 
   

Drink & Food

   

Computers & Networking

   

Policies & Law

   

Property & Estate

   

Education & Learning

   

Investment & Finance

   

Health & Hygiene

   

Careers & Employment

   

Automotive

   

Self Help

   

Technology & Science

   

Art & Creative

   

Recreation & Entertainment

   

Business & Commerce

   

Lifestyle & Fashion

   

Healthcare & Medicine

   

Issues & News

   

Travel & Vacation

   

Malls & Shopping

   

Family & Home

   

Games & Play

   

Adventure & Sports

   

People & Society

   

Teens & Kids

 

Index –› Computers & Networking –› SEO
 

One Million Pages of WebmasterWorld Dropped by Google as Forum Bans Bots

 

The top internet forum and best known discussion site for website owners, WebmasterWorld has been dropped entirely from Google! A site with over a million pages seeing over 2 million page views a month just disappeared from search engines! How often have you been searching for the answer to issues affecting your web site when you found a thread in WebmasterWorld forums in the top search results?

Never again will you see WebmasterWorld in search results until this bot ban is reversed.

The following URL actually takes up in the middle of the "FOO" forum discussion that runs over 40 pages (at the time of this writing) But there is a nice recap of issues that leads the page there recapping much of the previous 23 pages of discussion.

http://www.webmasterworld.com/forum9/9618-1-10.htm

Site owner Brett Tabke is being grilled, toasted and roasted by forum members for requiring logins (and assigning cookies) for all visitors and effectively locking out all search engine spiders. One big issue is lack of effective site search now that you can't use a "site:WebmasterWorld.com" query to find WebMasterWorld info on specific issues with a Google search. Tabke is being slammed for not having an effective site search function in place before getting the site dropped.

WebmasterWorld has been entirely removed from Google after Tabke decided to use robots.txt to block all spiders with a universal blocking of all crawlers.

User-agent: *

Disallow: /

He has stated that this is due to rogue bots clogging and slowing site performance, scraping and re-using content and searching for web reputation on individual companies within forum comments. I've a similar problem at my site on a much smaller scale. Crawlers can request pages at excessive rates that slow site performance for visitors. I've instituted a "Crawl-delay" for Yahoo and MSN, but rogue bots don't follow robots.txt instructions. (Google is more polite and requests pages at a more liesurely rate.)

Can't say I completely understand the WebmasterWorld action to ban all bots, or if it will achieve what Tabke is after, but it sure is creating a buzz in search engine circles. Lots of new links to WebmasterWorld will be generated by this extreme action and then, when access to search engine spiders is once again allowed from the robots.txt file, the site is likely to get re-indexed by all the engines once again in it's entirety.

That will certainly be a heavy crawl schedule to re-index over a million pages by the top search engines, further loading the server and slowing the site for visitors. Perhaps Tabke plans a phased re-crawl by allowing Googlebot to index the site first, then Slurp (Yahoo), then MSN bot, then Teoma. It could be that he's created more work for himself in managing that re-crawl.

When this happens, there'll be thousands of new links from all the buzz and many articles discussing the bot ban which will lead to WebmasterWorld becoming even more popular. Many have suggested the extreme move of banning all crawlers was simply a plan to gain public relations value, and links, but somehow I doubt it. Tabke claims the bot ban was done in a moment of frustration after his IP address ban list grew to over 4000 and management of rogue bots became a 10 hour a week job.

Barry Schwartz of SEO Roundtable interviewed Tabke after his dramatic decision to ban all bots. That interview clarifies much confusion, but still doesn't fully justify the dramatic move that effectively drops over one million pages from Google. http://www.seroundtable.com/archives/002863.html

Web reputation crawlers are partially at play here as well. Corporations looking for online commentary, both positive and negative to their company, use web reputation services which crawl the web with reputation bots (crawling mostly blogs and news stories) looking for comments about their clients that may harm or help them. This may be of value to those corporations, but it needlessly slows site performance to no advantage for webmasters. If a site owner has trashed a company on their blog, they certainly don't want the "Web Reputation Police" crawling their content in order to sue them for libel.

Rogue bots are a serious problem, but they simply can't be controlled with robots.txt. Tabke said himself that even the cookies and login are useless against serious scraper bots as the bot owner must simply manually enter their bots through the login, which assigns a cookie to it, then let it loose within the forums to automatically continue to scrape away once past the gate. Rogue bots don't follow robots.txt instructions.

I've often wondered why anyone would go to such lengths to steal content and re-use it elsewhere, when it is unlikely to help them in any substantial way. Everyone knows that content is freely available at several article marketing archives, but the rogue bot programmers seek out content that ranks highly first - and fail to realize that there are multiple reasons for those high rankings. Off page factors like quality, relevant, inbound, one-way links from highly ranked blogs and industry news sites. The bad boys out there stealing content won't get those inbound links - OR the high rankings on the sites where they've posted that scraped content.

Article archives experience scraper bots too. Bot programmers would rather write a bot program that collects content for them (to automatically dump it into another site) than to carefully choose relevant work to post in sensible hierarchies of useful content. Automated scrape and dump laziness. What other reasons would you have for scraping free articles?

The other reason for scraping content would be to plaster it up across Adsense and Yahoo Publisher Network (YPN) sites as content to attract advertisements and hope for clickthroughs from visitors seeking valuable keyword phrases that generate contextual ads worth more to those webmasters. This convoluted thinking results in sites that don't end up ranking very well and don't generate much income to those lazy, bot programming, nerds that create those types of sites.

There are several software and cloaking packages available to lazy webmasters that claim to gather keyword-phrase-based content from across the web via bots and scrapers, then publish that content to "mini-webs" automatically, with no work on your part required. Those pages are cloaked automatically, against search engine best practices, and then Adsense and YPN ads are plastered over those automatically created pages, yes, you guessed it - automatically. Serious search engine sp*m, cloaked, so search engines don't know.

One last reason for content scrapers is to find content to use on blogs in the latest craze used to fill those fake blogs (also known as Spam Blogs or Splogs) with content, then ping the blog search services to notify them of new posts. Constant newly added scraped content is added to the blogs and the pinging suggests that the blog is prolific and should be highly ranked. This is closely related and promoted by the above mentioned article scrapers. This is the latest type of spam that is being combatted by search engines. It seems that search engine sp*m is just as serious as emailed sp*m.

Good luck to WebmasterWorld's effort to ban those rogue bots and scrapers!

Copyright December, 2005 by Mike Banks Valentine

Author: Mike Valentine
 
Author Bio:

Mike Valentine

Mike Banks Valentine is a Search Engine Optimization Specialist emphasizing the use of ethical techniques. He has operated a small business ecommerce tutorial at WebSite101 since 1998, offering tutorials on internet business for ecommerce entrepreneurs. A news hound and business enthusiast, he blogs about search engine developments, web content issues, and privacy issues on three separate blogs.

 
 
 

Related Articles

 
Make Sure The World Sees Your Website
 
Read This Before you Submit to Web Directories
 
World of Warcraft Herbalism Tip
 
How Platespin Server Consolidation Works
 
5 ways to stop spam from reaching your mail box
 
Get Paid to Drive Your Car?
 
What is a Web Directory?
 
The benefits of the Blu-ray disc.
 
Various Ways To Make Money With Domain Names
 
Do Computer Repair Classes Make Sense For You?
 
 
 
 
 

Electrifying Broadband - at your Doorstep

Just as telephone companies have extensive networks all over the world, so also do power companies. ... - Glow Networks
 

Internet Dial Up Access Providers

Although the use of high-speed broadband Internet access is now on the rise, dialup Internet access ... - Josh Riverside
 

Ezine Advertising,it sounds simple enough?

This is a how-to article on optimizing dollar for dollar performance for one's ezine advertising whi ... - Mike La Penna
 
 

20 Ways to Increase Web$ite Traffic and $ales with eBooks

Increase your website traffic and ebook sales. Marketing Master, Catherine Franz, shares a quick che ... - Catherine Franz
 

THREE Secrets to High Search Engine Rankings the Easy Way!

Learn these three EASY steps to get to top of the search engines! - Dayne Herren
 
 
   Index >> Privacy >> Terms of Use
© 2008 www.favoritearticlesinc.com All Rights Reserved.