MapReduce is a programming model and an associated implementation for processing and generating large data sets. Users specify a map function that processes a key/value pair to generate a set of intermediate key/value pairs, and a reduce function that merges all intermediate values associated with the same intermediate key.
I spoke with Java architect and distributed systems expert Eugene Ciurana about MapReduce and he contends that "indexing large amounts of unstructured data is a difficult task regardless of the technologies involved. MapReduce provides a simple, elegant solution for data processing in parallelized systems."
As more sites move to manage large data sets, the uptake of frameworks like MapReduce and projects like Hadoop is sure to grow. And along with the growth of the data is the growth of the market opportunity. Open source is a great way to expand and enlarge the adoption curve as users figure out the best way to use these new tools.
Qizmt is currently being used in the MySpace "People You May Know" feature, and will soon expand to user recommendations and other new areas.
Follow me on Twitter @daveofdoom.