Mind to Market

Friday, September 28, 2007

Data Munging

There has been a flurry of blogs recently from Deepak Singh, Neil Saunders, Hari and others bemoaning the lack of integration among biomedical data sources. Highly trained bioinformaticians are spending most of their time data munging; searching, formatting and other types of mundane processing to prepare the data for analysis instead performing the higher level analyses they've been hired to do.

Part of the problem here is a framing issue; what are the fundamental objectives the bioinformaticians are trying to accomplish? Data munging has become such a predominate activity that many in the field feel that this is the sum total of what bioinformatics is. But as Deepak states: "I believe that knowledge lies in what can be done with data, rather than the data itself." If we spend all our time data munging will we ever get to a point where we can actually work with the data?

The first step in the process of reducing the amount of processing is to provide syntactic integration; web services or APIs that allow the bioinformatician direct access to the data sources. This is in fact occurring with many data sources now offering web service interfaces. But syntactic integration is just the start. Although there is access to the data, much of the salient knowledge is lost unless integration takes place on a semantic level.

Semantic integration is usually equated to standard vocabularies or ontologies, topics that are the focus of the Semantic Web for Health Care and Life Sciences Interest Group. The HCLSIG is making some progress but it's apparent from the comments to the above blogs that many bioinformaticians are unaware of what this group is doing or find it irrelevant to their work. Undoubtedly their work is its early stages and the standards are still under development but this technology does hold out some hope as to reducing the amount of needless data munging.

Labels: , , ,


Post a Comment

Subscribe to Post Comments [Atom]


Create a Link

<< Home