Last week I reconnected with Jeff Jonas, chief scientist of the IBM Entity Analytics group and a recently named IBM Fellow, about what's going on in the realm of big data.
When I first met Jonas, back in June of 2010, he was focused on how companies are dealing with the deluge of information associated with Big Data. His focus hasn't changed, but he told me his perspective on how we make sense of data continues to evolve -- especially as we move in and out of demand for real-time versus batch data processing.
New Big Data tools make it much more affordable to gather and organize large sets of data that can be analyzed in its raw form. As advanced analytics applications get applied against that data, it becomes dramatically easier to identify the direct cause-and-effect relationship between business events, regardless of what department is nominally in charge of that event or the process associated with it.
According to Jonas, the three V's -- volume, velocity, and variety -- are the essential characteristics of "Big Data" that will grow exponentially, rather than in a linear fashion. Accordingly, you have to plan for data growth in conjunction with any projects you plan to undertake.
But planning is just one aspect of the Big Data situation. More important is knowing what you want to get out of the data analysis you're performing. Trying to make more sense of data is growing in importance for businesses of all kinds, but the techniques employed are relative to the problem you're trying to solve.
Jonas found that by default most organizations go with a batch approach, using tools like Hadoop and other MapReduce implementations. This approach works well for "thinking apps," where you are looking for information and context to inform a bigger notion or data amalgamation as opposed to a real-time decision. But there are many things that can and should be done in real-time primarily because batch/MapReduce processes are too late with decisions and suggestions.
Based on his experience at IBM, Jonas suggested that as the value of the analysis rises, business users will demand that data be delivered sooner -- even if it's not totally practical. "I need to know now, not later, what this data means to my business."
Ultimately, this all feeds into the work Jonas has been doing around puzzles and problem solving, changing the way to think about the context of the data. There may be more than one aspect to the problem. Moneygram, for example, is using IBM Identity Insight for context accumulation -- weaving data together to address fraud complaints, which have dropped 72 percent since they started using it, proving the value of Big Data analysis in real business terms.
As a side note, Jonas is a big triathlete. He told me that if completes all the rest of the races he's planned this year and with two more next year he's been given the impression that he will be the third human being to have done every international Ironman event. Cheer for him as he swims/bikes/runs by.