ie8 fix

MapReduce

Google vows not to sue over certain patents for open source

Google today is "taking a stand on open source and patents," vowing not to sue anyone on specified patents unless first attacked.

The company, which today announced its Open Patent Non-Assertion Pledge, said to start with, it has identified 10 patents related to MapReduce, a model for processing large data sets. It has pledged not to sue any user, distributor, or developer of open-source software based on patents related to MapReduce.

Duane Valz, Google senior patent counsel, said in a blog post that Google wants to ensure open source software remains open:

"At Google we believe that … Read more

Where IT is going: Cloud, mobile, and data

Cloud computing seems to often get used as a catch-all term for the big trends happening in IT.

This has the unfortunate effect of adding additional ambiguities to a topic that's already laden with definitional overload. (For example, on a topic like security or compliance, it makes a lot of difference whether you're talking about public clouds like Amazon's, a private cloud within an enterprise, a social network, or some mashup of two or more of the above.)

However, I'm starting to see a certain consensus emerge about how best to think about the broad sense … Read more

IBM Fellow Jeff Jonas on the evolution of Big Data

Last week I reconnected with Jeff Jonas, chief scientist of the IBM Entity Analytics group and a recently named IBM Fellow, about what's going on in the realm of big data.

When I first met Jonas, back in June of 2010, he was focused on how companies are dealing with the deluge of information associated with Big Data. His focus hasn't changed, but he told me his perspective on how we make sense of data continues to evolve -- especially as we move in and out of demand for real-time versus batch data processing.

New Big Data tools … Read more

Big data in context

A few weeks back I attended venture firm Accel Partners' New Data Workshop event and learned quite a bit about the state of what we are now commonly referring to as "big data" and the challenges that await the vendors trying to target this new way of slicing and dicing vast amounts of information.

One of the big takeaways for me was the realization that even with all of the processing power available nowadays, the amount of data is growing at such a rapid pace that people are simply looking to cope with the problem, rather than facing it head on.

The issue of processing large amounts of data is not necessarily new--most developers and IT staff can tell you about having too much information to deal with--but, the big difference is that there are new approaches, tools and technologies that can help alleviate the difficult in processing.

Over the course of the last 30 years or so the way that machines process transactions has changed, but so too has the vast amount of data that is being processed and collected, now with an eye toward real-time analysis of information.

This has led to the advent of a number of technologies that allow for data processing to be offloaded and managed in both structured and unstructured ways--examples include open-source projects like Memcached and Hadoop as well as NoSQL data storage mechanisms like Cassandra.… Read more

Could open source abandon the Google train?

As arguably the world's largest open-source company, Google has a big stake in maintaining its place at the heart of the open-source ecosystem. Recent events, however, suggest that Google can't rest on its laurels if it wants to secure the hearts and minds of open-source developers.

Make no mistake: Google needs those developers. Android, Chrome (and Chrome OS), and other Google initiatives depend upon fostering vibrant open-source communities that can help it to surpass Microsoft and Apple.

Such communities may be ready to cut the Google umbilical cord, however, which should be worrying to Google.

There have been … Read more

MySpace to open source data processing

MySpace today announced a new open-source project called Qizmt, a distributed computation framework developed by its data mining team.

Qizmt is based on the MapReduce distributed processing framework, well-known as a core part of Google's search indexing infrastructure. Qizmt, however, runs on large clusters of Microsoft Windows servers, an interesting sidebar to a computing style we most commonly associate with commodity Linux machines.

MapReduce is a programming model and an associated implementation for processing and generating large data sets. Users specify a map function that processes a key/value pair to generate a set of intermediate key/value pairs, … Read more

Hadoop buzz continues to excite the cloud

Hadoop is the popular open-source implementation of MapReduce, a powerful tool designed for deep analysis and transformation of very large data sets. It enables you to explore complex data, using custom analyses tailored to your information and questions. It's also one of the most buzz-worthy, talked about open-source projects around.

I spoke with Christophe Bisciglia, Hadoop World organizer and founder of Cloudera, to ask some questions about this inaugural event. And by the way, if you're interested in attending, click on the link in the answer to question No. 5. (My readers get a 25 percent discount if you register before September 15.)

Q: How can you explain the buzz around Hadoop? It's deafening. … Read more

More universities join Yahoo for Net-scale research

Yahoo has signed up three new universities to participate in Internet-scale computing research, the Internet pioneer said Thursday.

The University of California-Berkeley, Cornell University, and the University of Massachusetts-Amherst have joined an effort that already included Carnegie Mellon University, Yahoo said Thursday. The universities get access to a cluster of Yahoo computers called M45 that runs open-source software called Hadoop that can be used to process data rapidly.

Yahoo is a major contributor to Hadoop, a project within the Apache Software Foundation's collection, but Google created the underlying technology through its MapReduce algorithm. MapReduce and Hadoop can be used … Read more

Amazon launches Hadoop data-crunching service

This was originally posted at ZDNet's Between the Lines.

A correction has been made to this story. See details below.

Amazon on Thursday announced a new cloud computing service that uses Hadoop, a free software framework, to crunch tons of data.

The service, called Amazon Elastic MapReduce, is designed for businesses, researchers and analysts trying to conduct data intensive number crunching (statement). Hadoop, which is used by companies like Yahoo, is trying to be pushed into the enterprise data center by start-ups like Cloudera.

Correction, 7:15 a.m. PDT: This story initially miscast Google's connection to Hadoop. … Read more

Understanding MapReduce and Hadoop (Video)

For those of you interested in just how cloud computing (and I do mean, computing) works, check out this video from a recent AWSome Atlanta Cloud Computing User's Group. Twitpay's Don Brown explains how open source applications MapReduce and Hadoop are used to process enormous amounts of data at Google and other large websites.

For more on MapReduce, check out these articles by Eugene Ciurana. For more on Hadoop (including support) check out Cloudera.

Via John M. Willis

You can follow me on Twitter @daveofdoom