One of the most talked about open-source projects is having its second annual Hadoop World Conference next month in New York. On the heels of a successful inaugural event , 2010 promises more than 25 presentations from the likes of Bank of America, eBay, HP, Orbitz, Twitter, Facebook, and Yahoo (full agenda here). Also, for the second year running, here is a code for my readers to get a 20 percent registration discount: CNETHW2010.
To provide a small taste of what the event will offer, I corresponded with Hadoop World speaker Linden Hillenbrand, product manager of Hadoop Technologies at General Electric, to get an idea of how GE leverages Hadoop and the use case he'll be presenting at the show. Hillenbrand has been using Hadoop for six months, starting with distribution 18 from Cloudera.
What can attendees expect learn about Hadoop from your presentation at Hadoop World?
Hillenbrand: Hadoop has enabled us to deliver a critical business need. The solution is driven by a complex algorithm and powered by Hadoop. Attendees will be able to fully grasp the computational power of Hadoop, and how a traditional RDBMS and flex front end can be combined with Hadoop to produce tremendous results with a fantastic user experience.
What are you using Hadoop for at GE?
Hillenbrand: We are currently running several use cases on our Hadoop cluster while incorporating several different disparate sources to produce our results. Along with sentiment analysis, we are running Web analytics on our internal cloud structure, looking at load usage, user analytics, and failure mode analytics. We have built a recommendation engine for our intranet involving various press releases users might be interested in based on their function, user profiles, and prior visits to parts of our site. We are also working with several types of remote monitoring and diagnostic data from our energy and wind businesses.
Are there applications for Hadoop in the back office? In your IT operations? Back testing models? What others?
Hillenbrand: Being a company focused around quality and Lean strategies, operations is a huge driver. Hadoop is currently playing a large part in our Web log analytics from our internal support teams. Identifying failure cases, further robust logging for user analytics, and more stringent rules and correlations based on business need.
How do you solve those problems today?
Hillenbrand: These problems are addressed by several different legacy systems. Everyone in this industry has the same issues, they have standardized processes that have been and constantly are in use. However, the data and scope that is being covered by them is growing larger and larger. This is where we are leveraging Hadoop to be an addition to those legacy systems.
What keeps you from using Hadoop more?
Hillenbrand: Three things: Management tools, underdeveloped utilities, constantly evolving demands on our experts.
What benefits do you see from Hadoop?
Hillenbrand: Further enabling our analytic capabilities: looking at data sets never examined due to sheer size. Further insights into business critical data: combining data from multiple data source and running computationally complex algorithms across those data sets. A cost decrease while increasing performance and a local storage, scalable, fault tolerant system.
What did you use before Hadoop?
Hillenbrand: Multiple tools, including Oracle, SAS, MySQL, Informatica--all which are still being used. Hadoop becomes an enabler for them, not a replacement.
What are you hoping to get out of your time at Hadoop World?
Hillenbrand: I'd like to understand how other companies are leveraging the technology, best practice share across the industry, how to solve further complex problems. With new technology comes great benefit, but at the same time, a new realm of issues needs to be addressed. This is the perfect forum for solving these problems.