Big data in context
A few weeks back I attended venture firm Accel Partners' New Data Workshop event and learned quite a bit about the state of what we are now commonly referring to as "big data" and the challenges that await the vendors trying to target this new way of slicing and dicing vast amounts of information.
One of the big takeaways for me was the realization that even with all of the processing power available nowadays, the amount of data is growing at such a rapid pace that people are simply looking to cope with the problem, rather than facing it head on.
The issue of processing large amounts of data is not necessarily new--most developers and IT staff can tell you about having too much information to deal with--but, the big difference is that there are new approaches, tools and technologies that can help alleviate the difficult in processing.
Over the course of the last 30 years or so the way that machines process transactions has changed, but so too has the vast amount of data that is being processed and collected, now with an eye toward real-time analysis of information.
This has led to the advent of a number of technologies that allow for data processing to be offloaded and managed in both structured and unstructured ways--examples include open-source projects like Memcached and Hadoop as well as NoSQL data storage mechanisms like Cassandra.
For the moment the blocking factor facing any and all of these technologies is the level of difficulty associated with using the software that wasn't initially designed to be used by less-skilled developers and IT staff.
Importantly, the rise of these big data products has introduced a new wave of vendors including Northscale (focused on Memcached and Membase), Cloudera (focused on Hadoop) and Riptano (focused on Cassandra) that have expertise in the technology and appear poised to ride the wave, assuming that enterprises are interested and willing to pay. (Disclosure: I am an advisor to Riptano.)
But there is no one specific technical approach that solves every problem, making it an interesting challenge for those attempting to implement these different products and also for the vendors trying to predict the future.
So while there are many ways to solve the problem the important thing is the approach. Monolithic legacy technologies are being replaced with distributed computational resources (often compute clouds) and relational databases can be replaced, or perhaps offset with NoSQL data stores that allow for extremely high volumes of transactions without requiring SQL-specific programmatic methods.
The question remains as to where these resources live, in a corporate data center or out on the Internet at providers like Amazon Web Services or Rackspace Cloud. Odds are a hybrid approach to processing and managing this data--both through internal compute clouds and public cloud offerings--is the path to success.
What remains to be seen is how quickly enterprises will make their way to the new world of big data, and how much consistency, in both application data and design matters across the variegated ways of customers currently manage their information.
Dave Rosenberg dishes up "Software, Interrupted" with nearly 15 years of technology and marketing experience that spans from Bell Labs to multiple start-up IPOs to open-source enterprise software companies. He is co-founder of MuleSource and currently serves as the general manager of Hardy Way. He is an adviser to Canonical, IT Database, Puppet Labs, Riptano, and SOBA Labs. He is a member of the CNET Blog Network and is not an employee of CNET. Disclosure. You can contact Dave via e-mail at softwareinterrupted@gmail.com or follow him on Twitter @dr138. 







"For the moment the blocking factor facing any and all of these technologies is the level of difficulty associated with using the software that wasn't initially designed to be used by less-skilled developers and IT staff. "
We have designed RainStor to use "traditional" SQL and ODBC/JDBC access methods to access data within the RainStor repository. Thereby solving a major issue of having to learn yet another new paradigm. Our repository provides all of the efficiencies of Big Data retention through significant reduction in physical data size, while being compatible and instantly integrateable with enterprise applications.
- by andy_venu August 11, 2010 6:08 PM PDT
- Dave - thank you for this "state of the union" article....an excellent point you bring up is .." But there is no one specific technical approach that solves every problem, making it an interesting challenge for those attempting to implement these different products and also for the vendors trying to predict the future" That I believe is the increasingly important role of a neutral consulting and services company such as my employer Impetus Technologies - where our Big Data consultants and architects are helping enterprises - choose and eventually implement the right product/ solution mix - from the variety of commercial and open source options available. We recently helped a large company (5B$+) in the east coast to put together a cloud solution based on Cassandra. We work with product vendors as their partners or in some cases directly for the end-customer and bring in the product vendor into the enterprise customer after our analysis phase.
- Like this Reply to this comment
-
(3 Comments)