Mind to Market

Friday, September 28, 2007

Data Munging

There has been a flurry of blogs recently from Deepak Singh, Neil Saunders, Hari and others bemoaning the lack of integration among biomedical data sources. Highly trained bioinformaticians are spending most of their time data munging; searching, formatting and other types of mundane processing to prepare the data for analysis instead performing the higher level analyses they've been hired to do.

Part of the problem here is a framing issue; what are the fundamental objectives the bioinformaticians are trying to accomplish? Data munging has become such a predominate activity that many in the field feel that this is the sum total of what bioinformatics is. But as Deepak states: "I believe that knowledge lies in what can be done with data, rather than the data itself." If we spend all our time data munging will we ever get to a point where we can actually work with the data?

The first step in the process of reducing the amount of processing is to provide syntactic integration; web services or APIs that allow the bioinformatician direct access to the data sources. This is in fact occurring with many data sources now offering web service interfaces. But syntactic integration is just the start. Although there is access to the data, much of the salient knowledge is lost unless integration takes place on a semantic level.

Semantic integration is usually equated to standard vocabularies or ontologies, topics that are the focus of the Semantic Web for Health Care and Life Sciences Interest Group. The HCLSIG is making some progress but it's apparent from the comments to the above blogs that many bioinformaticians are unaware of what this group is doing or find it irrelevant to their work. Undoubtedly their work is its early stages and the standards are still under development but this technology does hold out some hope as to reducing the amount of needless data munging.

Labels: , , ,

Monday, August 27, 2007

Programming Biologists

In an interview for the blog Blind.Scientist, Alexei Drummond gave his opinion on the difference between developing software applications in academia and in the private sector. Academics develop software "for the love of it" meaning: "for themselves" and secondarily for others he said. Software developed by private or commercial enterprises is primarily developed for use by others but the issue here is keeping the software abreast of the science.

Drummond should know the difference, he has worked in both academics and in the private sector as the Chief Scientist at Biomatters, a bioinformatics company he helped found. During his tenure in academia, Drummond found himself in the position of being one of the only people in his department who could program and thus was frequently recruited to write scripts to help the other scientists in their research. Although this role can be quite satisfying, it distracts scientists from their research and can lead to a career shift as Sandra Porter points out.

Can a biologist learn to program and remain a true research scientist? Some basic scripting language skills are becoming a normal part of the science education process but when the primary purpose of these programs is to enable others' research, you cross a line between biological research and professional programming. Not that this is a fine line, it's actually very wide, and most research scientists will find it quite difficult to cross it unintentionally. Determining just how much programming is enough and when to call in a professional is a difficult one, especially if the solution to your research problem is only one script, or database, or programming language away. As Drummond says "I used to program ten different languages very well." Ten?!

It will become all too apparent that people who spend their entire day (or career) thinking in software development terms will have developed more refined skills and higher productivity than those that only dabble in programming. Don't forget that many of the more refined skills that programmers develop are to make applications easier for others to use. These are skills that most programmer/scientists will never need so don't worry if you have less than ten years of experience under your belt.

Having no programming skills does put one in at a disadvantage in an age where software is becoming commonplace. Relying on a third party to accomplish even rudimentary tasks can be frustrating as well as inefficient. Lacking programming skills also makes communicating with programmers more difficult, adding complexity to an already inefficient process.

Labels: , , ,

Wednesday, October 04, 2006

ISCB Rocky '06

If you enjoy talking bioinformatics with some of the most creative minds in the field, AND streaking down some of the best slopes on earth, then don't miss the 4th Annual Rocky Mountain Bioinformatics Conference in Aspen from Dec. 1-3. Dr. Susan Trapp of the University of Colorado Computational Bioscience Program co-chairs the conference which brings together computational biologists in a relaxed setting conducive for sharing ideas and building the nascent bioinformatics community. This regional meeting is an official conference of the International Society of Computational Biology. Don't forget the October 10th abstract submission deadline!

Labels: ,

Friday, September 01, 2006

Biz Plan

I've been heads down on a business plan lately and all that that entails. Mostly combing through hundreds of articles, documents, websites. And of course talking to people, mainly my partner Frank, who has first hand knowledge of the industry from a technical point of view. Not that he didn't observe the business side and reach some understandings about it. The bioinformatics industry has gone through some very dramatic transformations, as has all of IT, over the last ten years. Just by coincidence I ran into the former CFO of Genomica at BioWest last week. They had raised a total of $120 million in a matter of months on the basis of a business plan, a management team and a whole lotta hype. Those were the days!!

Labels: ,