Science 4 Data

Science 4 Data

Data Science is a very confusing term, it ranges from people who use Excel to doing statistical regression analysis to doing sophisticated pattern recognition and ultimately machine learning. It gets confused with business intelligence, or graphing, or visualization. It is the latest tech marketing buzz word that tries to package up a set of
tools or skills or activities into a group that can be sold. Most of what you are likely to find are people who do some nice

Most of what you are likely to find are people who do some nice graphics, maybe allowing for interactive drill down. This is more business intelligence than data science. Sometimes you might find people doing some statistics and they call it data science instead.

What you will rarely find, and this is what separates the real from the rest, are people applying scientific computing against large data samples to find patterns and insights that cannot be arrived at any other way. This is what “data scientist” do.  So “data science” is a mix of scientific computation on large datasets using mathematics and algorithms that find insights that are not possible to do through any human process.

This is where the greatest value is created, it is how DNA sequencing was achieved, it is how Amazon does recommendation or Google gets to search results.

This is not something that is a commodity, one cannot learn these techniques in a 12-week crash course.

It is also not a set of techniques either, it may be a set of common tools, but each problem domain needs to formulate a hypothesis, then the experimental design, then capture the data, then do the analysis, and draw some interpretations that lead to more questions and it iterates again. This is the scientific process applied to data. This can be frustrating to a business that wants answers, not more questions, business want sound bite prescriptions, but the businesses that understand how to seize the opportunity of using the scientific
method are able to separate themselves from the competition.

This gets missed by most businesses who want to grossly simply things to a slide in a powerpoint deck. That can be done as a point in time check-point but it misses the main benefits of gaining insights from the process.

This is why banks have a team of quants, and trading firms have the same, and these quants are doing active work each day, using these tools to iterate and as they gain insights then encode them into the algorithmic trading platforms.

This is something that cannot be done with IBM Watson which is a collection of tools from companies that IBM acquired. Without knowing the internals and being able to change the algorithms and processing pipeline the benefits will not surface. With IBM Watson and other solutions like it, you get a box of tools without the ability to change them. Scientist build their own tools against the experiment they are going to perform.

People who are most active in apply scientific computing to problems may not think of themselves as data scientist, they are physicist, social scientist, biologist, or any researcher who has to manage large data to process it to find the patterns and insights. They learn to create the tools and techniques by applying them to real problems.

If you are a business looking for a data scientist, you might not recognize one if they don’t identify themselves as such.

Where I have found the talent to do “data science” were amongst the research community of scientist that I know. They were right under my nose and I did not realize it. I know many scientist who work with data on a daily basis.

My advice, don’t look for a data scientist, instead find a scientist who knows how to work with data. They likely know many of the same tools that “data scientist” use, but they can formulate the algorithms, and create the tools, not just use the tools with existing
packages.

If you have data, and you are not sure what insights might be buried in that data, and you don’t know where to find the people who can help, give us a call.