The Machine Readable Web

The vast majority of the Web is intended for human readers. The goal has been to create an online experience for human beings. It is an open and ever growing body of information. This is all great, but it does present some problems. There is just too much there. We aren't sure what information to trust. We can get lost in the Web and waste a lot of time. So we need some software tools to help us, but the information itself is not structured in a way that software can easily deal with. Enter the machine readable Web. The most basic way for software to deal with information on the Web is to simply read the HTML of the pages and "analyze" it. This is what search engines do. They have software agents called spiders that walk the Web and index the pages. They then use various techniques to give us the "best" pages for the search queries we enter. This is helpful and essential, but you still have to go to the pages (many pages) and try to find what you want. And you need to know when to go back to get updated information. You may even know that a page has the information you want and that it will be updated regularly, but you don't want to go back again and again to get that bit of information off that page. There are tools called "screen scrapers" or Web page extractors that can read the pages and extract just the information you want, but the pages are unstructured and changing. The rules you describe for extracting the information may be complex and may not work as the page changes. And content providers often don't want you to use their page that way. They want you to look at the whole page, so that you will get the other messages they have on the page (like marketing messages), not just the bit you want. They try to put up a "no droids allowed" sign, in this case, "no robots, we want human eyeballs only". Some content providers realize that you can't always come to their site and that if they will give you a useful summary of what is on their site, you might come more often to see the details (and the other stuff you really don't want to see, but live with to get the content you want). A very useful way of doing this is using RSS feeds. RSS (Really Simple Syndication) provides the summary in an XML file that a software agent can easily process. RSS news readers or information aggregators go and get the summary for you and then you can see if you want to click through to see the details. (See http://www.pewinternet.org/PPF/r/144/report_display.asp), 5% of internet users are using RSS. Most of these people are classic early adopters. But it seems like RSS is moving quickly to being more widely adopted. But even this relative simple standard was not easy to get to. There was a lot of conflict between the "keep it simple" crowd and the "more features" crowd (see http://itpapers.zdnet.com/whitepaper.aspx?scname=GSM&docid=97767. So a machine readable Web is starting to become a reality with RSS and Web services and may progress even further with something like machine-to-machine or the semantic Web. Early adopter consumers are starting to adopt the idea via RSS. The key will be for content providers to adopt a richer set of machine readable formats like they have started to do for RSS and keeping it as simple as possible so a wide variety of software developers can provide tools for the end users. This may be the key to making the Web even more useful. Ron Tower is the President of Sugarloaf Software and is the developer of Personal Watchkeeper, an information aggregator supporting a variety of ways to summarize the Web. http://www.sugarloafsw.com

The Machine Readable Web

Suggest a Correction

Comments (0)

Latest News

Creative Poetry Prompts Specifying Meter, Image, and Volta

Iterative Prompts for Turning Messy Outlines into Dynamic Scenes

AI-Powered Character Questionnaires That Feel Truly Specific

Crafting Vivid Setting Details with Constrained Prompts

How to Ask an LLM for Scene Pacing Feedback as a Writer

Search

Newsletter

Popular Posts

How to Positively Navigate Errors and Mistakes

The Power of AI in Maintaining Writing Consistency Across Long Projects

ChatGPT for Creative Writing: Fuel Your Fiction Imagination

AI Tools for Poetry Composition and Literary Analysis: A Practical Guide

How to Effectively Engage Your Website Visitors: 10 Crucial Tips