Friday, December 29, 2006

I'm Feeling Lucky?

The original Google interface, back when they were just a search engine, had a text entry field and two buttons: Google Search and I'm Feeling Lucky. Most of us click on the Google Search button by instinct perhaps because we've been unsuccessful with the IFL button or perhaps because we just can't believe that it could actually work. The IFL button bypasses the results page and takes you directly to the first web site returned in your query. Could save a lot of time, right? If you are looking for a popular or unambiguous page, say "iPod Shuffle," you may expect to be taken right to Apple's iPod Shuffle page and you'd be right (as of the date of this blog). The one person I talked to who actually used this feature said "when I'm doing a search for http://www.nytimes.com/ it always gets me there." I'm sure people who are a bit more accomplished at using a browser use it too....

But any search that is reasonably ambiguous will require some manual filtering to get you to the site you want. Entering "Steve Connolly" will get you not the famous blog you see here, rather, the site of the well known Elvis impersonator (although we've never been seen in the same room together). In fact, since many of our queries are ambiguous, we prefer the option of manually filtering the results before plunging onto a web page. But what if your workflow required numerous nested queries? Requiring a hybrid automatic/manual process would improve accuracy but slow the process and consume valuable resources. One solution is to reduce ambiguity by standardizing the vocabulary; using the standard vocabularies of ICD-10, UMLS in the case of biomedical vocabularies. By agreeing to a standard vocabulary users can quickly determine whether the term "Steve Connolly" writes a blog or croons in a Vegas lounge. Would this put the I'm Feeling Lucky button out of business? Not yet. Although we've agreed to a single definition of the term, there may be many references to it, and these references would be returned in the query. We have however, significantly cut down on the inaccuracies in our results.

The next step is to standardize the associations to references. Google uses the sheer number of links connecting two terms which can be misleading. Google bombs are examples of how these associations can be manipulated; one well known example is the term "miserable failure" which has been linked by many individual Web sites to the biography of George W. Bush, thus, at the time of this blog a query of "miserable failure" will return the biography as the top ranked result. Semantic Web technologies seek to provide more explicit associations between terms, eliminating statistically based results with results that are definitive.

Once we've decided what a "Steve Connolly" is, next we can ask what a "Steve Connolly" does. We've established a subject, then we can query for associations, or predicates, and objects connected to those predicates. Implementing the subject, predicate, object format of the Semantic Web is neither easy nor straightforward and thus the original Web is in no danger of disappearing. But given time and effort the two original Google buttons: Google Search and I'm Feeling Lucky will gradually merge into one: I'm Feeling Google??

