Mind to Market

Wednesday, May 14, 2008

The Promises of Web 3.0

Put off by the unfulfilled hype of Web 2.0, some in the knowledge management community are now clamoring for Web 3.0. Dr. David J. Roberts, Chief Scientist at iBASEt, writes in Oracle's Profit Online that Web 2.0 technologies aren't really worth the bother and that the real value sought by enterprises lies in Web 3.0 technologies.

The compelling promise offered by Web 3.0 technologies is the ability to make inferences between contextually linked information thereby pulling new, creative combinations out of knowledge bases automatically. This is the holy grail of knowledge managers; to get machines to be able to reason, even just slightly, would offer a great deal of value.

With such a compelling value proposition as reasoning will Web 3.0 technologies render Web 2.0 worthless? There are still some very large obstacles to Web 3.0 as Roberts has described, i.e. machines can't handle ambiguity and major pieces of the Web 3.0 language (ontologies) have yet to be produced.

But is the interim value of Web 2.0 technologies really that low? Web 2.0 technologies provide integration for well defined processes, and it’s the fact that they must be well defined that renders them inflexible and in need of constant maintenance. Yet this integration is indeed quite valuable, too valuable to be left on the shelf until Web 3.0 is ready for prime time. The future world of Web 3.0 is indeed rosy but don't count Web 2.0 out just yet.

Labels: , , ,

Monday, March 24, 2008

Killer Semantic Apps

Is coming up with a definitive application for demonstrating the utility of semantic analysis really that difficult? TextWise, a software development company in Rochester, New York, apparently thinks a good idea for this technology is worth at least the $1 million they are offering the winner of their SemanticHacker $1M Innovators Challenge.

The rules of this contest require the contestant to use, or propose to use, the SemanticHacker API, based on TextWise's Semantic Signatures® technology, to develop a software application for a specific industry vertical. Although it is up to the contestant to propose a vertical, TextWise suggests industries such as "healthcare or pharmaceuticals might be good places to start." Wonder who tipped them off?

As explained in TechCrunch, Semantic Signatures® uses natural-language processing to extract relevant terms from text then applies semantic analysis to automatically categorize Web pages. Not a bad idea, but the technology can be a bit flaky. Semantic Signatures® used Wikipedia as a reference; connecting the concepts extracted from the text and matching them to Wikipedia articles.

One of the cornerstones of the W3C specification for the Semantic Web is its use of Web Ontology Language (OWL), although OWL only specifies the format of the ontologies there is the assumption that human domain experts will be required to accurately develop an ontology. TextWise claims that ontologies developed in this way "do not align with customer needs and…rapidly become obsolete." Perhaps, but without some agreement on the ontology all you have is a folksonomy, which reduces its value in collaborative efforts.

The Holy Grail that drives the concept of semantic analysis is the ability for the software to do "connecting the dots" process that is normally done by humans. We humans can juggle a few thousand "dot" in our heads but connecting one to another, or maybe some complex combination of five to another twelve, gives most of us a headache. And when you start thinking about connecting a million or more dots, well, time to start thinking about a simpler project, like brain surgery.

Labels: , , ,

Tuesday, October 09, 2007

Data Glut in Research

Chemical & Engineering News placed information management in pharmaceutical R&D on their cover for the October 1 issue. In addition to pointing out some of the information ills that plague the industry, C&EN has added a blurb on the "Internet Orphan" Semantic Web. Although short on concrete examples of just what SW can do, the article points out that SW technologies could replace data-mining as a way to derive knowledge from your data.

So what about this "Internet Orphan" label? Apparently since SW technology has been around for several years and no one has really picked it up it has earned the name. Life sciences with their exploding stores of data, and thinning drug pipelines, has a real need for technologies that can wring more knowledge from the databases. Coupled with the fact that biology is a science, and is therefore smaller and more quantifiable than the Web as a whole, applying the structure of SW to the biological knowledge domain makes much better sense than applying it to the entire Web.

And least I forget the hype; SW is being compared to the early days of the Web. "What is happening now on the Semantic Web is similar to what was going on in the five years leading up to that explosion [of 1995 that kicked off the Web]," claims John Wilbanks, executive director of Science Commons.

Labels: , , ,

Friday, September 28, 2007

Data Munging

There has been a flurry of blogs recently from Deepak Singh, Neil Saunders, Hari and others bemoaning the lack of integration among biomedical data sources. Highly trained bioinformaticians are spending most of their time data munging; searching, formatting and other types of mundane processing to prepare the data for analysis instead performing the higher level analyses they've been hired to do.

Part of the problem here is a framing issue; what are the fundamental objectives the bioinformaticians are trying to accomplish? Data munging has become such a predominate activity that many in the field feel that this is the sum total of what bioinformatics is. But as Deepak states: "I believe that knowledge lies in what can be done with data, rather than the data itself." If we spend all our time data munging will we ever get to a point where we can actually work with the data?

The first step in the process of reducing the amount of processing is to provide syntactic integration; web services or APIs that allow the bioinformatician direct access to the data sources. This is in fact occurring with many data sources now offering web service interfaces. But syntactic integration is just the start. Although there is access to the data, much of the salient knowledge is lost unless integration takes place on a semantic level.

Semantic integration is usually equated to standard vocabularies or ontologies, topics that are the focus of the Semantic Web for Health Care and Life Sciences Interest Group. The HCLSIG is making some progress but it's apparent from the comments to the above blogs that many bioinformaticians are unaware of what this group is doing or find it irrelevant to their work. Undoubtedly their work is its early stages and the standards are still under development but this technology does hold out some hope as to reducing the amount of needless data munging.

Labels: , , ,

Wednesday, September 19, 2007

AI – Semantic Web Link

Back in the day when computers read punch cards and spun tape drives, popular thought was that they would soon develop human-like intelligence, i.e. artificial intelligence (actually artificial general intelligence). Over the past 50 or so years, AI has had enough spins on the hype cycle to effectively deflate any expectations. But a recent article in Dr. Dobb's Journal claims that AI: It’s OK Again! Time to get back on that hype cycle?

Although the AGI folks claim the AI folks have sold out, AI has been making some progress as an engineering practice in areas such as natural language processing, speech recognition, vision and search. Although Tim Berners-Lee says the Semantic Web is not AI, the author of the article thinks that SW is a fertile knowledge base for AI.

Semantic Web does in fact resemble the "connectionist" theme of AI; the theory that to understand brain behavior you must develop a model of the brain's physical structure; neurons and synapses. The connectivity of SW would therefore mimic this physical structure.

The AGI folks believe that computers will exceed the intelligence of humans any where from nine to 40 years from now. I wonder if I could teach one to blog?

Labels: , , , ,

Friday, September 14, 2007

NY Times on Semantic Web

Not to be out done by the Economist, the New York Times ran an article on the Semantic Web recently. Peter Wayner does an admirable job at trying to explain the concepts of SW in layman's terms. Wayner points out the weaknesses of search engines, such as Google, that rely on backlinks, or less discriminate methods of associating terms versus SW, which provides a more consistent method for making associations.

Another example that he sights is the ability to make rules that make connections between names and their associated variations, such as Robert and Bob. Although humans can readily make these associations (assuming they are familiar with local naming conventions) computers cannot without explicit instructions. The inference that we can then make using SW is that if an individual is named Robert and another individual is named Bob and Robert and Bob are equivalent than the two individuals are one in the same.

Is this starting to convey some of the value of SW technology? He does mention to popular social networking sites; Facebook and LinkedIn that, although they are not strictly SW technology, have implemented structure in their associations between members and from that can develop rules that are based on that structure. Given the popularity of these sites, this may be an example that people can relate to.

Labels: , ,

Wednesday, August 29, 2007

Economist on Semantic Web

An indication that the concept of using Semantic Web technologies to improve the Internet may be reaching the mainstream in a recent article in the Economist: The long-promised "semantic" web is starting to take shape. The examples are rather lame though, possibly an indication that the most valuable applications of SW technology is not yet in the consumer space but in the more esoteric domains of healthcare and life sciences. I mean, most travel websites do a pretty good job of taking in large amounts of data then rapidly crunching the numbers to produce a reasonable itinerary in a short amount of time for no fee. Does the consumer require new technology to improve on this? While I doubt the Economist has won over any new converts with this article keeping this technology on the lay public's radar will be useful in making advances in information technology.

Labels: ,

Monday, July 23, 2007

Semantic Web in Health IT

Brian Robinson has written an article on Semantic Web/Web 3.0 (although they are hardly synonyms) in GovernmentHealthIT. Dr. Parsa Mirhaji, director of the Center for Biosecurity and Public Health Informatics Research at the University of Texas, is quoted as saying: "people now have to log on to five or six different systems to get complete information about patients." The assumption is that when everything complies with Semantic Web that will be unnecessary, that this information will be linked semantically between systems. No more endless searches to find the information that you are looking for.

Mirhaji doesn't stop at simply integrating the information though. Computational models based on SW will allow computers to make inferences much like humans. One immediate use of SW is the ability to normalize the medical nomenclature; equating terms that mean the same thing.

Oracle is on the bandwagon and promoting the wonders of SW. Bob Shimp, VP of Oracle's Global Technology Business Unit, is hopeful that SW technologies will be in use in health IT in the next couple years.

Labels: , , ,

Thursday, April 12, 2007

The Future of the Web

BusinessWeek has run a special report on the future of the web in their April 9 issue and seems to be pointing the direction of the Semantic Web technologies. As an example, BW sights the pharmaceutical giant Eli Lilly as an early adopter of SW technologies for use in drug development. In an attempt to slash a third of the cost off the development of a new drug, they are implementing SW technologies to better manage their data.

Tim O'Reilly is quoted as saying "Web 2.0 is the messy way that the Semantic Web is actually happening." Messy in the fact that it's a bottom-up folksonomy rather than the top-down ontology driven structure required by SW specifications.

Does this exposure in BW mean SW is ready for prime time? "I'm still trying to figure out what Web 2.0 is" remarked one VC at last night's MIT Enterprise Forum in Boulder. But while the entrepreneurs try to sort out the business model, Lilly's IT department goes it alone, "There aren't a lot of people we can turn to with experience." Seems like there's got to be a business model in saving a pharmaceutical company $400 million off the cost of developing one drug.

Labels: , , ,

Thursday, March 22, 2007

Web 3.0 Already?

In the midst of the Web 2.0 hype cycle is it time to begin the buzz on Web 3.0? Although Web 2.0 was a big step forward, its limitations are becoming apparent even as its definition has only just been resolved. The connections put in place by Web 2.0 by social networking, folksonomies and tagging have provided a higher level of functionality for some applications, but the connections are only loosely defined. Much more powerful functionality will come with better defined connections and structured frameworks.

Although the term Web 3.0 was never used by the founders of Semantic Web, there is a growing acceptance that the two are synonymous. Certainly the proponents of Semantic Web technologies, including Tim Berners-Lee, could benefit from the idea that their ideas will form the next version of the Web. And it appears that the public is ready for the technology as well, the functionality if not the demands it will require.

So what can Web 3.0 do that 2.0 cannot? For one it helps computers better "understand" terms used on the Web. What is the difference between a book and a basketball? Simpler technologies would recognize that they are spelled differently and that would be it. Web 3.0 will categorize them and provide them with a set of associations that will define what they are. A book is a subject that contains information and is associated to readers by a relationship called "is read by." Many such associations can connect the book to other objects, i.e. "is stored on" a bookshelf. As these associated networks grow, more knowledge about what a book is, and how it is distinguished from a basketball, is compiled and a clearer vision of book is developed.

This is a similar process to human learning and, like humans, as the knowledge networks grow they will become more "intelligent." The process will begin with specific knowledge domains, such as libraries of books, airline travel or drug development, and continue, theoretically, until the barriers between the domains break down and connections through the entire Web are established.

One obstacle will be the structure imposed by the Semantic Web. Web 2.0 calls for a very informal structure where users apply their own tags however they see fit or not at all. Semantic Web on the other hand, requires strict adherence if it's going to function correctly.

But the pay off for applying structure is inference and reasoning; the ability for the software to make connections when given the proper data. This ranges from simple inferences such as: if hepatitis is a disease and it occurs in the liver, it must be a disease of the liver. Although not rocket science for humans, assembling networks of logical statements in a structured framework will be a big step forward for computers.

Much of human knowledge is acquired over time and through experiences. This type of knowledge is stored away in the brain to be pulled out at a later time when certain associations may be required, say in diagnosing a disease. A less experienced physician may not have experienced a patient with certain symptoms that an older colleague would have. But an effective Web 3.0 knowledge base may supplement the less experienced physician's knowledge and allow her to operate as if she had the knowledge possessed by the more experienced physician.

The value of such a system in the hands of a skilled user is to rapidly amplify the knowledge that they can process. Web 3.0 technology has been called "XML on steroids." Given the discipline that is required to implement it however, its use will be constrained to only the most valuable markets for the near term.

Labels: , , , , ,

Friday, December 29, 2006

I'm Feeling Lucky?

The original Google interface, back when they were just a search engine, had a text entry field and two buttons: Google Search and I'm Feeling Lucky. Most of us click on the Google Search button by instinct perhaps because we've been unsuccessful with the IFL button or perhaps because we just can't believe that it could actually work. The IFL button bypasses the results page and takes you directly to the first web site returned in your query. Could save a lot of time, right? If you are looking for a popular or unambiguous page, say "iPod Shuffle," you may expect to be taken right to Apple's iPod Shuffle page and you'd be right (as of the date of this blog). The one person I talked to who actually used this feature said "when I'm doing a search for http://www.nytimes.com/ it always gets me there." I'm sure people who are a bit more accomplished at using a browser use it too....

But any search that is reasonably ambiguous will require some manual filtering to get you to the site you want. Entering "Steve Connolly" will get you not the famous blog you see here, rather, the site of the well known Elvis impersonator (although we've never been seen in the same room together). In fact, since many of our queries are ambiguous, we prefer the option of manually filtering the results before plunging onto a web page. But what if your workflow required numerous nested queries? Requiring a hybrid automatic/manual process would improve accuracy but slow the process and consume valuable resources. One solution is to reduce ambiguity by standardizing the vocabulary; using the standard vocabularies of ICD-10, UMLS in the case of biomedical vocabularies. By agreeing to a standard vocabulary users can quickly determine whether the term "Steve Connolly" writes a blog or croons in a Vegas lounge. Would this put the I'm Feeling Lucky button out of business? Not yet. Although we've agreed to a single definition of the term, there may be many references to it, and these references would be returned in the query. We have however, significantly cut down on the inaccuracies in our results.

The next step is to standardize the associations to references. Google uses the sheer number of links connecting two terms which can be misleading. Google bombs are examples of how these associations can be manipulated; one well known example is the term "miserable failure" which has been linked by many individual Web sites to the biography of George W. Bush, thus, at the time of this blog a query of "miserable failure" will return the biography as the top ranked result. Semantic Web technologies seek to provide more explicit associations between terms, eliminating statistically based results with results that are definitive.

Once we've decided what a "Steve Connolly" is, next we can ask what a "Steve Connolly" does. We've established a subject, then we can query for associations, or predicates, and objects connected to those predicates. Implementing the subject, predicate, object format of the Semantic Web is neither easy nor straightforward and thus the original Web is in no danger of disappearing. But given time and effort the two original Google buttons: Google Search and I'm Feeling Lucky will gradually merge into one: I'm Feeling Google??

Labels: , , , , ,

Wednesday, December 06, 2006

Barriers to the Semantic Web

Although I'm a big fan of semantic web technology, we incorporate it in our product, it isn't the silver bullet some people think it is. Zack Rosen has written an excellent blog on some reasons why semantic web adoption hasn't taken off.

Labels: