The Open Road

Read all 'Hadoop' posts in The Open Road
September 17, 2009 8:57 AM PDT

Q&A: Visa dips a toe into the Hadoop pool

by Matt Asay
  • 1 comment

As cloud computing edges its way into the enterprise, the open-source Apache Hadoop project may well prove to be the poster child of the movement. Hadoop effectively gives enterprises the power of Google or Yahoo Web indexing for free, or for the cost of a CloudEra subscription if you want to involve Hadoop's core developers in your rollout. Credit card giant Visa is an early corporate adopter of Hadoop, and points to a bright future for the open-source project.

I caught up with Visa's Joe Cunningham, head of the technology strategy and innovation group, to talk about the company's adoption of Hadoop.

Q: What got you interested in Hadoop initially and how long have you been using Hadoop?
Joe Cunningham: It's early days for us here at VISA for Hadoop. It's still very much classified as a research and development activity.

My role is the head of technology strategy and research and development for the company. Our task is to look outside the company for interesting technologies on the landscape and identify potential opportunities for those technologies to add value to either the VISA business of VISA technology and then bring them in and play with them in our lab research environment until they are ready for mainstream or commercial activity.

Hadoop is one of those technologies we've been looking at for about a year and we think it offers certain value as an augmentation to existing systems and capabilities VISA has.

Q: How do you use Hadoop at VISA? What made you think it could be the best solution for what you're trying to accomplish?
Cunningham: The most important thing to remember is VISA obviously has a heritage of offerings--very large, very scalable, very reliable, and very secure services to the payments industry. And we're continuously trying to innovate and make those services more valuable to our clients and ultimately to cardholders.

We have a data challenge we attempt to meet every day in terms of the number of transactions we handle and therefore we think there's an opportunity to look at the skills VISA already has in the data analytics space with the power of Hadoop to handle very, very, very large volumes of data.

To put that in context, we handle approximately 200 million transactions a day at VISA. That works out to be about 8,000 transactions a second, and with that comes huge volumes of data and Hadoop offers the potential to harness some of that along with some of our existing capabilities to extract more value from those transactions.

We have a data challenge we attempt to meet every day...[and] think there's an opportunity to look at the skills VISA already has in the data analytics space with the power of Hadoop to handle very, very, very large volumes of data.
--Joe Cunningham, VISA

Q: Are there particular directions in which you'd like to see Hadoop evolve?
Cunningham: I think we're interested in looking at Hadoop and looking at its evolution over time. We're certainly interested in how the Hadoop community continues to operate in this open-source environment.

My specific interest is how can Hadoop evolve from the alpha beta environment in which it is today to the mainstream and how can we continue to integrate it as a mainstream technology with all the existing platforms we have here at VISA.

I'll give you two examples. The operations management space is very important to us: how we guarantee the reliability and security of our systems and how Hadoop can be merged or integrated into that environment. And secondly, and I guess this is a common question, but how can we enable SQL-like access to some of the data via the Hadoop file system or via the Hadoop engine?

Q: Given that it's still early days for Hadoop at Visa, it's interesting that you're speaking at the upcoming Hadoop World conference, along with JP Morgan Chase, China Mobile, and other Hadoop users that may be further along the adoption curve. What are you going to be talking about?
Cunningham: I plan to talk about the application stream, so I'll be taking a business-focused view of how we see Hadoop offering value to Visa. If there are tech junkies in the room, they are probably not going to be as interested in what I talk about.

I plan to spend a little bit of time showcasing Visa's technology today to set the scene. I will talk a little bit about our research and development function and how it works with the rest of Visa. Then I'll spend some time expanding on what I call our information products business.

The information products business for Visa offers services to our clients that are, obviously, information-based. So, some of the use cases where we see Hadoop potentially offering value in the future are in the areas of transaction analysis (particularly for risk products and the modeling of risk scenarios), fraud analysis (assisting our clients in potentially managing fraud more carefully), and in the loyalty space where Visa offers services on behalf of our clients to cardholders.

There's an opportunity for us to combine the power of Hadoop with data analytics capabilities that Visa has to augment those services and products on behalf of our clients.

In fact, that's an area that I'm hoping to learn a lot at the event. In some industries, Hadoop is very much mainstream but for some others, it's still emerging and I'm trying to understand whereabouts on that hockey stick or [Gartner] Hype Cycle Hadoop is, or whether Hadoop is already mainstream and it's just a matter of us catching up.

It's always good to gauge and plot that evolution. I think you need to get to these events and talk to other companies and key leaders in the community to really understand where we fit and what we should be doing next at Visa.


For those interested in attending Hadoop World in New York, the organizers are giving Open Road readers a 25 percent discount if you register by September 24.

June 12, 2009 9:00 AM PDT

The more Hadoop grows, the better Cloudera looks

by Matt Asay
  • Post a comment

The Internet largely abolishes scarcity in digital goods, shifting competitive advantage to those that can profit from abundance, not scarcity, like Red Hat, Google, and Facebook. For this reason, the more Hadoop grows as a community, the better the business opportunity for Cloudera, the start-up that distributes a commercial version of Hadoop.

Let me explain.

As CNET's Tom Krazit explains, "Hadoop is essentially an open-source version of the software Google uses to run its Web indexing servers." Yahoo also uses it internally for roughly the same reason, and has released its own open-source version of Hadoop to nudge adoption by other firms and to encourage contributions to the Hadoop project.

As Savio Rodrigues points out, however, Hadoop is already getting significant contributions from outside Yahoo. While initially dominated by Yahoo employees, Rodrigues points to recent data that indicates that 70 percent of Hadoop's community isn't employed by Yahoo.

That's great progress for Hadoop, and it's also great for Cloudera, the company that aims to make Hadoop relevant and useful for companies that lack the scale of a Google or Yahoo. Cloudera actively contributes to the Hadoop project, but perhaps its greatest contribution is in providing a commercial distribution of Hadoop.

The more contributors to Hadoop and the more complex it becomes, the greater the need for a Cloudera to provide a conservative, trusted distribution of Hadoop for enterprise customers. In other words, the greater the abundance of community around Hadoop, the more enterprises need scarcity: one throat to choke for their Hadoop deployments, not many.

As Yahoo and others contribute heavily to Hadoop, in short, they're also contributing to the likelihood of Cloudera's success.


Follow me on Twitter @mjasay.

June 8, 2009 2:26 PM PDT

Joomla! turns 10,000,000 and other news

by Matt Asay
  • 3 comments

I thought of just Tweeting a few of these news bits, but some deserve to be blogged. Alas! I lack the time today but....

  • Joomla has surpassed 10,000,000 downloads. It's hard to describe just how impressive this is, and particularly given the fact that these have come in the past four years, and after a fractious fork from Mambo.
  • The University of Southern Mississippi and the Department of Homeland Security's Science and Technology Directorate have launched the Homeland Open Security Technology (HOST) program, along with Open Source Software Institute (OSSI) and the U.S. Navy, to invest $1.5 million in the development of open-source technology.
  • A new report from Gartner suggests 42 percent of CIOs surveyed chopped their IT budgets in the first quarter of the year. Less budget almost certainly will mean more open source.
  • MindTouch (Disclosure: I am an advisor to MindTouch) CEO Aaron Fulkerson asks what's the big deal with Google's new Wave collaborative platform, given that wiki technology (like MindTouch's Dekiwiki) has been "waving" for years. Rafael Laguna, Open-XChange CEO, agrees. I still think Wave is cool.
  • Speaking of Google, the company recently launched Page Speed, an open-source Firefox add-on that "web developers can use...to evaluate the performance of their web pages and to get suggestions on how to improve them. Google continues to demonstrate ever-stronger commitment to feeding the open-source community.
  • Given that no one has yet settled on the optimal open-source code contribution model, MySQL developer Brian Aker discusses the Drizzle fork of MySQL and how he and the project team is handling third-party contributions to it. Very interesting insight into code contribution policies, copyright assignment, etc.
  • Acer, meanwhile, is crippling its Android Netbooks by having them dual-boot Windows. I don't have anything against Windows (well, that's not really true...), but this seems like an exercise in futility. If customers want Windows, give it to them. If they want the lower price (and different experience) of Linux, give that to them. But don't give them both, or they'll likely revert to Windows out of sheer habit.
  • Cloudera CEO Mike Olson indicates that Web applications are just the beginning for Hadoop. Indeed, Cloudera's easier-to-use commercial version of Hadoop is doing so well that Cloudera had to raise another $6 million just to keep up. Fortune, for one, thinks that Hadoop might be perfect to help power the electrical power grid.
  • Back in Redmond, Microsoft is coming under increased pressure from the European Commission, reports The Register, which may force Microsoft to offer rival browsers with Windows. Microsoft probably feels pretty beleaguered, but Roy Schestowitz offers up some data that indicates it's spending its free time pressuring European groups to side with it. It doesn't seem to be working.
  • Finally, Oracle executives didn't mince words in a town hall meeting with Sun employees, stipulating that some tough choices will be made about Sun technology and personnel. Indeed.
March 16, 2009 6:07 AM PDT

Cloudera harnesses Hadoop for the enterprise

by Matt Asay
  • 2 comments

The industry's premier Web players--Google, Yahoo, Microsoft, and Facebook--agree on at least one thing: the future is cloud computing, and Hadoop is the engine to power the cloud.

Cloudera, the company set up to harness the power of Hadoop for the enterprise, on Monday released its first commercial product, the Cloudera Distribution for Hadoop.

The Web, and how enterprises use it, will never be the same.

If I sound a little giddy, it's because I am. I think that Cloudera's distribution for Hadoop is one of the biggest things to happen to the enterprise ever because it opens up the world's biggest processing engine, the Web, to the average enterprise. As Ashlee Vance wrote recently in The New York Times, "the analytical powers of Hadoop can benefit a whole new class of businesses," ranging from companies specializing in biotechnology to oil and gas.

Hadoop makes the Web digestible for the enterprise. Or, rather, it makes the enterprise digestible by the Web. It derived from Google's innovative work with MapReduce, the technology that enables Google's massive Linux server farm to run efficiently and at peak performance.

When Yahoo discovered what Google was doing with MapReduce and Hadoop, it put a massive development team on the project to remove any advantage Google might have had by its early adoption. The story, which you can watch on YouTube, is pretty amazing.

Doug Cutting, the brilliant engineer behind Hadoop but also the open-source Lucene search engine, didn't open-source Hadoop in order to win plaudits with open-source monastics. In talking with Cloudera, it's clear that Cutting's purpose was simply to show other developers how to do Hadoop right, and the currency of development is code, not talk. So he open-sourced it.

Millions of dollars of investment by Google and Yahoo later, you can download Hadoop for free, including Cloudera's enterprise distribution, which offers a complete system to handle the processing and storage of big data. It's like putting the Web at your company's beck and call.

Best of all, Cloudera, in quintessential open-source fashion, charges customers for support and other add-on value. The price of entry, therefore, is $0.00.

Cloudera just raised $5 million in a Series A funding round led by Accel Partners and including open-source luminary Marten Mickos. But the real money should come as enterprises put the Web to work using the Cloudera Distribution for Hadoop.

You can view a screencast on how to configure the distribution online. Or you can simply download the software and get started. That's the power of open source,and particularly the power of open-source Hadoop: harnessing the power of the Web, with Cloudera available to facilitate--and not slow down--the process.


Follow me on Twitter at mjasay.

December 30, 2008 6:37 AM PST

Open-source integration: No vendors required

by Matt Asay
  • 4 comments

Over the Christmas break, I've watched one of the basic powers of open source in action. Two employees from my Alfresco team did something that is largely impossible in the proprietary world:

They wrote integrations to third-party open-source software, the Apache Hadoop and Drupal projects. No contracts changed hands. No NDAs. Just code.

Open source, of course, is a great way to get one's code in the hands of would-be customers, and then sell them support or other add-on services or software. But it's also a fantastic way to collaborate with would-be partners. Not a single lawyer need get involved until the code is working, and then only to divvy up responsibilities and revenue, if you so choose.

Try the above integrations between two proprietary companies. First you get contacts from both companies (probably the executives, depending on the size of the company, because who has authority to make that kind of a decision?) to start talking about the integration. Then, before any real work happens, the lawyers need to get involved. (While at Novell, I had one of the most distressing experiences in my life trying to negotiate a partnership with Siebel. It's not an experience I'd wish on my worst enemy, much less a partner.) Further work will then need to be done to define the integration, marketing teams will need to get involved to define the go-to-market strategies and whatnot. And so on, until eventually code actually gets written, a year or so later.

With open source, you just need one guy and a week or two of downtime over Christmas. With proprietary software, you need a small army. Which do you think is the more efficient model?

October 14, 2008 12:39 PM PDT

Microsoft cleared to commit code to Apache

by Matt Asay
  • 1 comment

Few will have noticed, but Microsoft's Jim Kellerman just announced that he and a Microsoft colleague have "been cleared to contribute patches again" to Apache, and specifically to the Hadoop project.

This is great news for Microsoft, and I think for open source generally. It means that Microsoft just became an open-source insider and may find it more difficult to sling mud as an open-source outsider in the future.

It's also good to have Microsoft's heft behind the Hadoop project, an incredibly cool open-source project that got additional help from CloudEra, a new open-source company helmed by former Sleepycat CEO Mike Olson that promises to help companies tap into the power of Hadoop. Who cares about Hadoop? Any Web developer that wants to "write and run applications that process huge amounts of data."

Microsoft gets deep into open source and Olson comes out of retirement. This is turning out to be a Very Great Day.

August 11, 2008 1:07 PM PDT

Hadoop: For those who actually want to get work done

by Matt Asay
  • Post a comment

I loved this post in The Register about Doug Cutting and his Hadoop open-source project. I've written about Hadoop before and the vision it shows on Yahoo!'s part, but El Reg is having none of that, castigating companies and projects like Twitter that release parts of their software as open source for media, not business, effect:

Twitter, which is widely accepted as the drum major of the Web 2.0 failure parade, released an open source project called Starling in January of this year. Starling is the Ruby-based messaging system that runs Twitter's backend. Yes, Twitter, the nonprofit web service known widely for its downtime, dropped its disaster-producing xxxxpile on the world. Why? Maybe they thought more competent developers would fix their problems. The more likely scenario is that they wanted to get a [a boost] from the fake tech media to make themselves look more important. I am guessing this is why no code has been released for Starling since it was open sourced. Oops.

Twitter decided they would be cute and trendy. They wrote their code in Ruby: the official state language of the hipster-developer nation. Doug Cutting, on the other hand, decided he would get xxxx done, and wrote Hadoop in Java. Starling was hidden away in some corner and forgotten (it's hosted at RubyForge...). Hadoop lives prominently at the Apache Software Foundation. Starling is a re-hash of an existing Java Enterprise API called JMS that has several open source implementations. Hadoop is an implementation of Google's MapReduce, a system that publicly only existed on paper. Hadoop has the added benefit of actually working.

Winner? Hadoop. Sometimes open source is cool because it's cool. Other times it's cool because it's useful (but hard). I like the latter kind.

Please head over to The Register and read the full article. One of the best I've read in a long while, the language notwithstanding. It's a great reminder to those of us in the open-source world prone to hubris, and it didn't even come from Savio. :-)

March 15, 2008 6:36 AM PDT

Open source is in our DNA, argues Yahoo! exec

by Matt Asay
  • 2 comments

I once took Jeremy Zawodny, technical director at Yahoo!, to task for not contributing enough back to open source. Today, Zawodny made it clear that openness and open source are in Yahoo!'s DNA. It is a trend that started long ago, Zawodny writes, and will only accelerate over time:

We've been on the openness road for a long, long time at Yahoo. And we take it rather seriously. Some times it hasn't been as visible as others, but believe me, the trend is quite clear when you look at all the data. The Open Source adoption and work. The APIs. The way we communicate with users and partners. The Blogs. The RSS feeds....

... Read more
August 21, 2007 10:15 AM PDT

Yahoo open-sources Google

by Matt Asay
  • 1 comment

This is a fascinating read from Baseline. I heard a bit about Hadoop and other Doug Cutting Lucene projects during a session at the O'Reilly Executive Radar session of OSCON last month. Hadoop is "an open-source project that aims to replicate Google's techniques for storing and processing large amounts of data distributed across hundreds or thousands of commodity PCs."

Sounds juicy, doesn't it? Especially in Yahoo's hands.

Tim O'Reilly gets this move exactly right: Yahoo is using open source in the Web 2.0 world in the same way that HP and other traditional software companies have used it in the packaged software world:

As a club to undermine competitors while blessing customers and developers.

O'Reilly writes:

... Read more
  • prev
  • 1
  • next
advertisement

15 sites that went kaput in 2009

Web sites launch all the time, but they also shut their doors. We highlight 15 that bit the dust this year.

Top 10 news stories of the decade

Let the debate begin: Was the iPhone more important than iTunes? Was anything bigger than Google finding a great business model? CNET offers its list of the 10 most important stories of the '00s.

About The Open Road

Matt Asay brings a decade of in-the-trenches open-source business and legal experience to the Open Road, with an emphasis on emerging open-source business strategies and opportunities. Matt is general manager of the Americas division and vice president of business development at Alfresco, a company that develops open-source software for content management. He is a member of the CNET Blog Network and is not an employee of CNET. Disclosure.

Add this feed to your online news reader

The Open Road topics

Most Discussed



advertisement

Inside CNET News

Scroll Left Scroll Right