• On The Insider: Miley Cyrus in Sex and the City 2

Software, Interrupted

Read all 'Cloudera' posts in Software, Interrupted
September 4, 2009 3:33 PM PDT

Hadoop buzz continues to excite the cloud

by Dave Rosenberg
  • Post a comment
Share

Hadoop is the popular open-source implementation of MapReduce, a powerful tool designed for deep analysis and transformation of very large data sets. It enables you to explore complex data, using custom analyses tailored to your information and questions. It's also one of the most buzz-worthy, talked about open-source projects around.

Hadoop World

Hadoop World

(Credit: Hadoop World)
I spoke with Christophe Bisciglia, Hadoop World organizer and founder of Cloudera, to ask some questions about this inaugural event. And by the way, if you're interested in attending, click on the link in the answer to question No. 5. (My readers get a 25 percent discount if you register before September 15.)

Q: How can you explain the buzz around Hadoop? It's deafening.

... Read more
June 22, 2009 8:58 AM PDT

CIA invests in open-source enterprise search

by Dave Rosenberg
  • 1 comment
Share

If any organization needs to make sense of unstructured data it's the government--especially agencies like the CIA and other intelligence groups that comb through a myriad of disparate information on an hourly basis.

Last week, In-Q-Tel, the technology arm of the CIA, invested in Lucid Imagination, which provides support, maintenance, and add-on software for Apache Lucene and Solr. According to Lucid, the Lucene/Solr technology is downloaded more than 9,000 times per day, and more than 4,000 organizations are using the software for enterprise search.

I've wondered aloud quite a few times as to whether or not open-source projects (and specifically Apache projects) can turn into businesses or if they are simply the cogs and wheels that make other products function better (aka the Oracle syndrome).

I probably would have argued that enterprise search would fall into one of those no-man's lands where the technology is important but not quite a standalone business. There has been a huge amount of venture capital investment in search but few big winners in the category.

But the investment from In-Q-Tel adds some credence to the value of the function as well as the technology in the respect that the government is actually using the software and not just making an investment as we see in the venture capital world. Lucene and Solr are "sufficiently complex" open-source products that require a commercial entity to support ongoing efforts once they are adopted. This gives Lucid a legitimate shot at building a business.

... Read more
June 1, 2009 5:03 PM PDT

Big data and Cloudera: Follow the money

by Dave Rosenberg
  • Post a comment
Share

I recently asked Cloudera CEO Mike Olson how a commercial open-source company balances community and commerce.

When it comes to open source, this isn't Olson's first rodeo; in his past life he served as CEO of the open-source database company Sleepycat, which was acquired by Oracle in 2006. Olson understands the fragile balance that exists in open source; he's a firm believer that good community relations are critical for open-source companies. Case in point--since we last spoke, Cloudera launched the industry's first certification program for Hadoop and MapReduce, open source projects that support data intensive distributed applications.

Cloudera on Tuesday is expected to formally announce the closing of a $6 million series B funding round led by Greylock (whose past investments successes include Red Hat among many others).

Olson reports that fast growth in the business and rapid adoption of Hadoop/MapReduce drove heavy interest from investors. For Cloudera, apparently it's a buyer's market, so it decided to secure funding now to allow it to expand the business rapidly on all fronts.

So, with $11 million in the bank from top-tier VCs (Accel led the A round and participated in the B) along with individual investments from Diane Greene (former CEO of VMware), Marten Mickos (former CEO of MySQL), and Jeff Weiner (president of LinkedIn), Cloudera has successfully raised the smart money to compliment the big data all-star founding team from Google, Facebook, and Yahoo.

For a brief overview of Hadoop and Cloudera check out the video below.

... Read more
May 23, 2009 4:28 AM PDT

Balancing open-source community and commerce

by Dave Rosenberg
  • Post a comment
Share

The tech media recently started taking serious notice of Hadoop, an open-source project developed to processing huge amounts of data, and the coverage is growing every day. According to ITDatabase, 161 stories have been written about Hadoop in the last three months alone, including a veritable "coming out party" in The New York Times.

Hadoop is interesting because it's proven in use at large Web shops, cloud-oriented, open-source, and it solves two major computing problems: handling large amounts of data, and writing parallel programs for large numbers of computers. Hadoop clusters can scale up to tens or hundreds of terabytes, or even petabytes.

But adoption doesn't always equal commercial success. I've written in the past about Cloudera, a company formed to support Hadoop, and recently sat down with CEO Mike Olson to get his thoughts on the burgeoning Hadoop ecosystem and how the company intends to balance community and commerce.

My initial question for Olson was how does the company succeed when users are happy with the open-source project?

Olson answered with several key points. Cloudera sees "big data" -- terabytes at least -- becoming a common problem for all kinds of companies. The early adopters of Hadoop were all Web 2.0 companies generating logs and mining them for user behavior data. But data processing at this scale is also an enterprise problem and enterprises aren't always early adopters and often require software to be supported by a vendor, not just a community.

Most enterprise buyers are very different from Facebook and Yahoo. They employ much smaller development and IT staff. They need strong SLAs and a quick response to problems from a vendor with deep expertise. Cloudera aims to solve those problems in ways that community support, mailing lists, and online forums can't.

This is typical of open-source projects that become more like products, and the challenge is ensuring that the project lives on and the commercialization efforts are balanced with good citizenship to non-customers.

The open-source community around Hadoop thus far appears to be pretty happy with Cloudera. The company has made its Cloudera Distribution for Hadoop available for free download, put a large amount of free training material on its Web site, and contributes to the open-source project with new features.

Good community relations are critical for open-source companies; getting this right is important for Cloudera.

Olson tells me that customers are running Hadoop in-house and, increasingly, in the cloud. A few weeks ago, Amazon even announced a hosted Hadoop offering called "Elastic MapReduce" -- more evidence that Hadoop has gone mainstream. From Olson's perspective, more Hadoop in the world means more demand for enterprise-grade services and support, and that creates a great opportunity for Cloudera to make life better for commercial users of the open-source project.

This is the key to maintaining the balance of commercial and community and others will certainly pay attention to how Cloudera interacts with the Hadoop community to learn what works and what doesn't.

Follow me on Twitter @daveofdoom

May 15, 2009 8:56 PM PDT

Hadoop breaks data-sorting world records

by Dave Rosenberg
  • Post a comment
Share

Hadoop

Hadoop

(Credit: Hadoop)

Yahoo's grid-computing team announced that Apache Hadoop broke world records in the annual GraySort contest in the Gray and Minute sorts in the general-purpose (Daytona) category.

Hadoop is the only open-source software to ever win the GraySort competition, adding another notch to last year's win at the Terasort competition, where Hadoop sorted 1 terabyte of data in 209 seconds. That beat the previous record of 297 seconds in the terabyte sort benchmark.

Within the rules for the 2009 Gray sort, our 500 GB sort set a new record for the minute sort and the 100 TB sort set a new record of 0.578 TB/minute. The 1 PB sort ran after the 2009 deadline, but improves the speed to 1.03 TB/minute. The 62 second terabyte sort would have set a new record, but the terabyte benchmark that we won last year has been retired.

If you want to learn more about Hadoop, the Cloudera blog has a great post titled 5 Common Questions About Hadoop that explains things pretty well.

Follow me on Twitter @daveofdoom

March 11, 2009 7:59 AM PDT

Understanding MapReduce and Hadoop (Video)

by Dave Rosenberg
  • Post a comment
Share

For those of you interested in just how cloud computing (and I do mean, computing) works, check out this video from a recent AWSome Atlanta Cloud Computing User's Group. Twitpay's Don Brown explains how open source applications MapReduce and Hadoop are used to process enormous amounts of data at Google and other large websites.

For more on MapReduce, check out these articles by Eugene Ciurana. For more on Hadoop (including support) check out Cloudera.

Via John M. Willis

You can follow me on Twitter @daveofdoom

December 24, 2008 9:36 AM PST

Cloud platforms of the future: Hadoop and Eucalyptus

by Dave Rosenberg
  • 2 comments
Share

Without a doubt, the cloud and all its forms and meanings were big news in 2008. Besides the huge growth of Amazon EC2 and Google App Engine, we saw Salesforce launch Force.com, a true platform-as-a-service.

My picks for the most interesting software of 2008 are Hadoop and Eucalyptus.

Hadoop is an Apache project, the "open source implementation of MapReduce, a powerful tool designed for the detailed analysis and transformation of very large data sets," which basically means you can process a ton of data on commodity hardware.

Hadoop is going commercial through Cloudera and while details are not publicly available, let's just say there are some very important and interesting foundations being laid for the way that people deal with computing and processing power.

... Read more
  • prev
  • 1
  • next
advertisement

The yogurt makers of tech: Gadgets to avoid

Don't buy these one-trick ponies--unless you like gizmos that gather dust.

Google wants to unclog Net's DNS plumbing

The Net giant, ever eager for a faster Internet, debuts its Google Public DNS service. With it, Google could become even more central to the Net.

advertisement

About Software, Interrupted

In "Software, Interrupted," Dave Rosenberg discusses disruption in the software market, as well as the products and services that keep business technology norms in perpetual flux.

With nearly 15 years of technology and marketing experience spanning from Bell Labs to multiple start-up IPOs, Dave co-founded open-source software company MuleSource and now serves as general manager of Hardy Way. He also happens to be a U.S. patent holder and a workaholic. Technology is his best friend and mortal enemy.

Add this feed to your online news reader

Software, Interrupted topics

Most Discussed

advertisement

Inside CNET News

Scroll Left Scroll Right