Link Reform To The Rescue

Okay, I can't wait any longer. I've had this in the works for a couple months and now that Andy started a similar thread I am forced to release my findings. There's a w3c states there can be only one robots.txt on each site and it must reside in the root directory otherwise it will be ignored. But wait, there is a META Robots tag that can be implemented on individual pages, doesn't that solve the problem? In short, no. Even if a web site owner can change the tag and tell a spider to not index a particular page, it can't be applied to a specific domain since each alias is serving up the exact same content. Every aliased domain uses the same files on every site hosted in that virtual directory. This can be changed programatically but again: most site owners don't have access to the necessary tools or resources to accomplish such a task. The syntax is and the allowed list of terms in content are Update: I've been told from an undisclosed Google source that less than 2% of Google's index consists of duplicate content. Even with that seemingly small number it equates to 160,000,000 pages containing duplicate content in Google's 8,000,000,000 page index. I think that's enough data alone to raise a few eyebrows. Once we have accurate data we can show, without a shadow of a doubt, that this change is a vital and necessary step in the evolution of the internet. One Possible Solution I propose an additional file be created that resides in the root directory of a web site. The file name is not important (linkreform.txt anyone?) but it's functionality is critical. This file should address the single biggest issue facing search relevance: duplicate content, and the effects duplicate content is having on the search industry as a whole. Possible Syntax A ===================================== # linkreform.txt for http://www.mainurl.com/ # # $Id: linkreform.txt,v 1.01a 2004/11/15 01:33:07 jdowdell # # Identify Main URLs That Should Be Crawled # # Main url preferred crawler starting point and to be used #in search results Parent-Domain: www.mainurl.com # First Alias - non www version of url Alias-Domain: mainurl.com # Second Alias - .net version of url Alias-Domain: www.mainurl.net Alias-Domain: mainurl.net # Additional Alias - completely different domain name Alias-Domain: www.aliasurl.com Alias-Domain: aliasurl.com # Additional Alias - completely different domain name Alias-Domain: www.aliasurl.net Alias-Domain: aliasurl.net # Additional Alias - completely different domain name Alias-Domain: www.aliasurl-a.com Alias-Domain: aliasurl-a.com # Additional Alias - completely different domain name Alias-Domain: www.aliasurl-a.net Alias-Domain: aliasurl-a.net ===================================== Here's another proposed doc format that simply states the main url the spiders should crawl and allows users to anonymously own and point several domains to the same place without giving their competitors any information about their aliases. Possible Syntax B ===================================== # linkreform.txt for http://www.mainurl.com/ # # $Id: linkreform.txt,v 1.01b 2004/11/15 01:38:11 jdowdell # # Identify Main URLs That Should Be Crawled # # Main url preferred crawler starting point and to be used #in search results Parent-Domain: www.mainurl.com The easiest implementation would be for the W3C to ammend the robots.txt specification and allow the following line to be added to the file. Possible Syntax C ===================================== # robots.txt for http://www.mainurl.com/ # # $Id: robots.txt,v 1.01b 2004/11/15 01:41:23 jdowdell # # Main url preferred crawler starting point and to be used #in search results User-agent: * URL-to-crawl: www.mainurl.com By implementing this new standard we could...

Facilitate more accurate results across all major search engines by reducing the number of duplicate content pages from non-spammers from their indexes.

Possible Side Effects: Financial and Sociological (both good and bad)

Reduction in the amount of ppc revenue generated by search engines since there would be more relevant results in the natural section.

Conversely, it may increase ppc revenue since results are more accurate.

Society isn't ready for the "less is more" approach just yet since most internet users don't know the difference between natural results and paid listings.

Search engines save money on overhead by using less resources for crawling & web hosting providers save money on bandwidth since less requests would be made.

Could completely backfire and engines that support it could lose face with visitors and advertisers.

Tim Bray had made some recommendations for changes previously as well. He points

Suggestions, Thoughts, Comments

http://www.w3.org/2001/tag/issues.html#siteData-36

Suggest a Correction

Found an error or have a suggestion? Let us know and we'll review it.

Link Reform To The Rescue

Suggest a Correction

Comments (0)

Latest News

Revision Prompts to Tighten Prose Without Losing Your Voice

Memoir Writers Using AI Ethically for Memory Prompts

Creative Poetry Prompts Specifying Meter, Image, and Volta

Iterative Prompts for Turning Messy Outlines into Dynamic Scenes

AI-Powered Character Questionnaires That Feel Truly Specific

Search

Newsletter

Popular Posts

How to Positively Navigate Errors and Mistakes

The Power of AI in Maintaining Writing Consistency Across Long Projects

ChatGPT for Creative Writing: Fuel Your Fiction Imagination

AI Tools for Poetry Composition and Literary Analysis: A Practical Guide

How to Effectively Engage Your Website Visitors: 10 Crucial Tips

Suggest a Correction

Share this article

Comments (0)

Latest News

Revision Prompts to Tighten Prose Without Losing Your Voice

Memoir Writers Using AI Ethically for Memory Prompts

Creative Poetry Prompts Specifying Meter, Image, and Volta

Iterative Prompts for Turning Messy Outlines into Dynamic Scenes

AI-Powered Character Questionnaires That Feel Truly Specific