Mind to Market

Tuesday, November 13, 2007

Google's OpenSocial

I've been investigating Google's new open standard for social networks: OpenSocial, is this a big step forward in integrating social network sites or an attempt to block Facebook's advance? There is no doubt that developers are faced with a confusing assortment of APIs to negotiate if they are going to link information from one Web site to another. And the information these APIs present is fairly standard stuff: names, demographics, relations, so why shouldn't this be standardized to make it easier to access?

Well for one thing Facebook is huge and growing bigger every day. They have an API which plenty of developers use. The big kid doesn't need a standard; it's the runners up that need one if they have any hope of staying in the game. And Google is certainly playing catch up here.

In many ways Google's initiative is similar to the caBIG initiative set up by the National Cancer Institute three years ago. Concerned about the proliferation of data silos among their grant recipients that hindered sharing and access to data, the NCI set up the Cancer Biomedical Informatics Grid (caBIG) as a way to standardize APIs among institutions. NCI has even set up a certification process to rate institutions on how well they comply with these standards.

An unfortunate downside to this standardization of APIs is the inability to access specialized functions or data that may reside at the data source. The standardization committee can be petitioned to extend the standard, but it is inevitable that not all terminology will be included. In some cases the data source will have their own API through which developers can access the non-standard information.

Social networks have a long way to go before they reach the level of complexity of cancer biology. So imposing a standard now may not be a big restriction on the information. At least developers will have now have a choice; use the OpenSocial API to access all social networks or site specific APIs if there are some specialized information to retrieve.

Labels: , , , ,

Friday, December 29, 2006

I'm Feeling Lucky?

The original Google interface, back when they were just a search engine, had a text entry field and two buttons: Google Search and I'm Feeling Lucky. Most of us click on the Google Search button by instinct perhaps because we've been unsuccessful with the IFL button or perhaps because we just can't believe that it could actually work. The IFL button bypasses the results page and takes you directly to the first web site returned in your query. Could save a lot of time, right? If you are looking for a popular or unambiguous page, say "iPod Shuffle," you may expect to be taken right to Apple's iPod Shuffle page and you'd be right (as of the date of this blog). The one person I talked to who actually used this feature said "when I'm doing a search for http://www.nytimes.com/ it always gets me there." I'm sure people who are a bit more accomplished at using a browser use it too....

But any search that is reasonably ambiguous will require some manual filtering to get you to the site you want. Entering "Steve Connolly" will get you not the famous blog you see here, rather, the site of the well known Elvis impersonator (although we've never been seen in the same room together). In fact, since many of our queries are ambiguous, we prefer the option of manually filtering the results before plunging onto a web page. But what if your workflow required numerous nested queries? Requiring a hybrid automatic/manual process would improve accuracy but slow the process and consume valuable resources. One solution is to reduce ambiguity by standardizing the vocabulary; using the standard vocabularies of ICD-10, UMLS in the case of biomedical vocabularies. By agreeing to a standard vocabulary users can quickly determine whether the term "Steve Connolly" writes a blog or croons in a Vegas lounge. Would this put the I'm Feeling Lucky button out of business? Not yet. Although we've agreed to a single definition of the term, there may be many references to it, and these references would be returned in the query. We have however, significantly cut down on the inaccuracies in our results.

The next step is to standardize the associations to references. Google uses the sheer number of links connecting two terms which can be misleading. Google bombs are examples of how these associations can be manipulated; one well known example is the term "miserable failure" which has been linked by many individual Web sites to the biography of George W. Bush, thus, at the time of this blog a query of "miserable failure" will return the biography as the top ranked result. Semantic Web technologies seek to provide more explicit associations between terms, eliminating statistically based results with results that are definitive.

Once we've decided what a "Steve Connolly" is, next we can ask what a "Steve Connolly" does. We've established a subject, then we can query for associations, or predicates, and objects connected to those predicates. Implementing the subject, predicate, object format of the Semantic Web is neither easy nor straightforward and thus the original Web is in no danger of disappearing. But given time and effort the two original Google buttons: Google Search and I'm Feeling Lucky will gradually merge into one: I'm Feeling Google??

Labels: , , , , ,