Google is arguably the world's largest open-source company, not only releasing a minimum of 14 million lines of open-source code but also hosting over 250,000 open-source projects on Google Code, in addition to its open-source advocacy work like Summer of Code.
Despite these open-source bona fides, it's still surprising to see someone at Google adopting Solr, an open-source search server based on Apache Lucene, for its All for Good site.
Google is the world's search market leader by a very long stretch. Why not use its own search technology? Why use Solr?
Google's Public Sector team suggested an answer last week:
One of the top concerns we've been hearing from nonprofit organizations who list volunteer opportunities on All for Good is that their opportunities aren't updated on the site as frequently as they need. This happens because...we crawl feeds from partners like VolunteerMatch and Idealist just like Google web search crawls web pages. Crawlers don't immediately update, they take time to find new information.
Today, we're rolling out improvements to All for Good that will help solve this problem and improve search quality for users. The biggest change, which you won't see directly, is that our search engine is now powered by SOLR, an incredible open source project that will allow us to provide higher quality and more up-to-date opportunities. Nonprofits should start seeing their opportunities indexed faster, and users should see more relevant and complete results.
I don't think this means that Google thinks Solr provides better results than its own code. Rather, I suspect this was simply a case of a Googler using her 20 percent "free" time to get a job done. It was likely easier to roll a service using Solr than to get official approval from Google to use its search technology for an important but nonprofit purpose. (My request for comment by Google had not been answered at the time of this post's publication.)
To me, this says much about the power of Google's culture: Googlers appear to be unfettered to use the best tool to get a job done, which may not always be the best technology, per se, but simply the most easily available technology for a given project at a given time.
The decision also says a tremendous amount about the value of open source, and of Solr in particular. If it's good enough for Google, as David Fishman notes, it's probably going to be just fine for you, too.
Update, 11:41 a.m. PDT: I heard back from Chris DiBona, open source and public sector program manager at Google, who offered this reasoning behind the move, in response to the suggestion that Google uses Solr:
I think you meant "Googler chooses Solr." You see, Allforgood.org is run by Our Good Works, a non-profit that works with technology companies and the whitehouse on that site. I'm on the Board of OGW, but it is run by Jonathan Greenblatt.
That said, we chose Solr because it made sense for the project we had. We want other companies/countries to be able to use the code we've written for Allforgood.org and to have it depend too heavily on Google Base precluded that, but specifically, technically speaking Solr fit the problem better than Google Base did.
So, it's not accurate to say that "Google chose Solr," but it is accurate to suggest that All for Good was founded by Googlers in their "20-percent time" and continues to be hosted by Google, as TechCrunch has reported, and that those Googlers, along with the rest of the board, opted for Solr over Google.
As DiBona mentions, and as I blogged above, this is a reflection of fit-for-purpose, and not any problem with Google's code. All for Good is completely open source, so it makes sense that it would opt for open-source Solr over Google Base.
Follow me on Twitter @mjasay.
Linux users are known for being a somewhat finicky lot. Despite broader application support for Windows and a better user experience in Mac OS X, Linux "desktop" users swear by the open-source operating system (and sometimes swear at its competitors).
It's therefore somewhat telling that Linux users overwhelmingly choose Google as their preferred search engine, according to data released today by Chitika, an online advertising network. Chitika analyzed data from 163 million searches across its advertising network between July 30 and August 16, and came up with the following:
(Credit:
Dan Ruby, Chitika)
Despite the concerns about Google and privacy and despite Microsoft's rising relevance in search through its Bing "decision engine," Google wins over Linux users 94.61 percent of the time. While it's not surprising that Linux users would shun a Microsoft-sponsored search engine, it is surprising that they so heavily congregate around just one search engine.
After all, this is the crowd that has created (literally) thousands of Linux distributions. For a community so devoted to choice, it's telling that such a disparate community would unify on Google search. Perhaps Yahoo's apparent willingness to prostrate itself before Microsoft has turned off the Linux crowd, but there are other alternatives.
Open source, after all, is all about alternatives. There are open-source alternatives to Google Analytics (Piwik, Open-Tube, etc.), Google Search Appliance (Lucene/Solr), Google Docs (OpenGoo), Google Earth (World Wind), and more.
But for search, the Linux contingent of the open-source community seems settled on Google.
Follow me on Twitter @mjasay.
It must be depressing to be Microsoft these days.
You spend $1.2 billion to acquire enterprise search leader FAST in January 2008 and then another $100 million on semantic search vendor Powerset in July 2008, only to have the excellent Apache's Lucene, an open-source search project, and Solr, an enterprise search server based on Lucene, offer better performance at a 100 percent discount.
Not very sporting of the open-source community, now is it?
Granted, Lucene and Solr still lack some of the spit and polish that Microsoft FAST, Autonomy, Google's enterprise search appliance, and other proprietary competitors offer, but this isn't slowing its adoption. As CMS Watch's Kas Thomas notes, interest in Lucene and Solr is skyrocketing, as measured by job postings (among other data):
Indeed.com job postings suggest high interest in Solr and Lucene
This was OK when Lucene stood alone, relatively rough and unadorned by Solr. But Solr makes Lucene much more palatable to enterprises that worry about getting mired in Microsoft (or Autonomy or Google or...), particularly with the uncertainty and unrest prior to its acquisition that may have led to an employee brain drain at FAST, and interest in Solr is improved further by the arrival of Lucid Imagination, a company started in 2007 to commercialize Lucene and Solr.
I talked Wednesday with Marc Krellenstein, co-founder and CTO at Lucid Imagination, and learned that while Lucene and Solr have been doing exceptionally well on their own, Lucid Imagination is in pole position to help advance the development of these open-source projects by offering dedicated development resources and to make a solid business for itself in the process.
Just having a company associated with Lucene and Solr may already be enough to get enterprises off the fence and behind the search project.
According to Krellenstein, Solr delivers significant performance improvements over proprietary alternatives. The goal is to continue to improve its functionality, which currently has roughly 80 percent of the total functionality of rival search products while also advancing innovation to surpass these rivals. The company and project have been making steady progress in these areas.
Did the Lucene/Solr community just upend billions of dollars in Microsoft, Google, Autonomy, and others' investments in proprietary search? Time will tell, but $1.2 billion for FAST is looking mighty expensive compared with the $0.00 Microsoft could have paid for Lucene.
Unfortunately, that's what its customers may be thinking, too. Like the U.S. intelligence community, for starters, which is now standardizing on Solr/Lucene. Microsoft and its peers must be hoping this will remain confidential.
Not a chance.
Disclosure: I am an adviser to Lucid Imagination.
Follow me on Twitter @mjasay.
Microsoft has officially named its next big attempt at squashing Google "Bing." CNET's Ina Fried covered Microsoft CEO Steve Ballmer's commentary on Bing at the D: All Things Digital conference on Thursday, but there's one important thing missing from the discussion and, indeed, from Bing itself:
Microsoft.
As I took a spin through the Bing demo, I was surprised by Microsoft's newfound restraint. Bing is...Bing. It's not branded "Microsoft Live Bing" or "Bing by Microsoft." It's just Bing.
Microsoft has a great brand, but it also has a brand that carries a lot of baggage with it, baggage that its search service (or "decision engine," as it describes Bing) really doesn't need. One of the great failings of Microsoft's past search efforts is that Microsoft tried to tie them into the larger Microsoft experience which, it turned out, wasn't helpful. Microsoft's brand is tied up in the desktop. Search is all about the Web.
Not coincidentally, Microsoft's Xbox has been a huge success in large part because it's a distinct brand with a distinct experience, one that doesn't rely on affiliation with Microsoft's desktop hegemony. Microsoft appears to be learning, perhaps with the U.S. Justice Department as its tutor, that tying products together isn't always the best solution.
So...Bing. It's a good name, and looks to be a great experience, one that makes "search" more of a destination, rather than a launch pad, as highlighted in Ballmer's "D" interview with Walt Mossberg. It's a destination that packages pieces of the Web to present a coherent response to search terms, making Bing more of a portal and less of a search engine.
Yes, in true Microsoft fashion, the maps used are provided by Microsoft and there are ties to other Microsoft products. At first blush, however, this doesn't appear to be heavy-handed. It's certainly no different from how Google prefers its own services to those of competitors.
I gave up on Microsoft Live Search long ago. I just might give Bing an extended fling, however, as it seems content to stand or fall on its own merits, not Microsoft's brand.
Follow me on Twitter @mjasay.
Microsoft for years has been warning the world not to use open-source software. Apparently, its Kumo search team didn't get the memo.
As The Register reports, Microsoft's new Kumo search technology is filled with open source and, in fact, the Kumo search team, formerly Powerset, "tr(ies) to use open-source software, if it is available."
In other words, open-source software appears to be the default choice for the Kumo team, not proprietary software. It looks like Microsoft's anti-open-source bubble really has burst.
Indeed, reading through the Powerset-turned-Microsoft-Kumo team's description of its approach reads like it was written by an open source-friendly IBM:
Instead of creating a proprietary copy of these pieces of infrastructure, Powerset decided instead to turn to Hadoop, a Lucene subproject that is a framework for running data-intensive applications on large clusters of commodity hardware...Unfortunately, there was no Hadoop equivalent to Google's BigTable storage engine.
Because we have benefited greatly by leveraging the available Hadoop technology, Powerset decided to give back to the community by developing an open-source analog to BigTable that is built on top of HDFS (Hadoop Distributed File System). After all, we need to develop it, anyway, it isn't part of the Powerset "secret sauce," and we, in turn, could benefit from contributions from other members of the community.
Is this the future of Microsoft?
At least in the short term, the answer seems to be yes. Microsoft has been demonstrating that while it remains skittish about licensing its important software under an open-source license, it is very keen on consuming open-source software and embedding it into its proprietary products. JQuery's admission to the Visual Studio code base is just one example.
CIO.com wrote recently that Microsoft has lost its focus, with serious consequences. I think that this is true.
I also believe that Microsoft's fear-mongering around open source cost it years of productivity and quality gains that it could have been delivering to customers through open source. I hope that reign of ignorance is over.
Microsoft CEO Ballmer argues that Microsoft needs to be more disruptive in search, and open source is a way to start moving in that direction; prime rival Google embraces open source unreservedly.
For Microsoft's Kumo search, it certainly is moving in that direction. But when will we see Microsoft embrace open source for its core products? I'm not holding my breath.
Follow me on Twitter @mjasay.
In the wake of Google's weekend error that labeled the entire Web as malware, some like CMS Watch analyst Kas Thomas are asking a provocative and timely question: have we become too dependent on Google?
One wonders: If Google were to go down (or become essentially unusable -- same thing) for, say, 72 hours or more, how disruptive would it be to the economy? Would online retailers see a slowdown in business? Would job-seekers remain out of work longer? Would the productivity of information workers (who supposedly spend a couple hours per day doing online searches) be seriously affected?...
Sometimes even the most highly distributed, highly virtualized, "enterprise-hardened" infrastructure is no stronger than its weakest component. And quite often, the weakest component is human. That's never going to change--cloud or no cloud.
In the case of the Google error, which was caused by a simple human mistake, the world arguably went its merry way without serious disruption. But it's a fair question, and the same one formerly raised about Microsoft's dominance on the desktop. When one company dominates a market so completely, does it become an essential facility and hence require government regulation to ensure that it doesn't bottleneck the economy?
I'm not sure. I tend to eschew government regulation whenever possible, and I'd hate to see Google significantly constrained by U.S. oversight. Even so, the weekend snafu demonstrates just how vulnerable Google is to attack, as well as how susceptible we'd be to going down with Google.
Yes, other search engines are just a click away, but with more and more people enveloping their online lives with Google products (Gmail, News, Finance, Reader, etc.), an error in one aspect of Google's product suite could have a domino effect on all of them, and significantly hamper productivity until Google fixes the source error.
Even so, the answer to Microsoft's dominance wasn't regulation: it was competition. Google, too, will face increased competition on the Web, so perhaps the answer to the concern is simply to wait. Over time, open source and other trends will no doubt diminish the relevance of Google's stranglehold in online search.
But for now, I can't help but feel a little vulnerable.
In a hugely interesting Piper Jaffray research note reported in Barron's, analyst Gene Munster suggests a few strategies for Yahoo's incoming CEO Carol Bartz, among them that Yahoo should acquire a major media company like The New York Times (good idea), but also that it should outsource search to Microsoft.
Search has never been a core competency for Yahoo, and outsourcing will both generate short-term cash and allow Yahoo to focus on content.
Not core? If search hasn't been core for Yahoo, for whom is it core? Google, yes, with nearly 70 percent of the search market, but is Google the only one that has search in its DNA?
Munster's research note suggests that Microsoft can claim search DNA, but its track record doesn't necessarily confirm this. Lucene and the new company around it, Lucid Imagination, has search in its blood, but enterprise search, not Web search.
I would have thought that Yahoo, with its second-place market share in search, should be credited with having interest and competency in search too, but Munster apparently disagrees. Does this leave us with Google versus everyone else, with "everyone else" roughly translating to "Microsoft"? This doesn't seem like a very healthy market dynamic, and certainly not one that will generate real innovation in search.
Perhaps it's true that the recession will spur Google on to innovation, but perhaps there are other factors that will encourage competition, as fellow CNET Blog Network writer Don Reisinger suggests.
Regardless, it's as worrisome to see Google owning nearly 70 percent of the search market as it was for Microsoft to own more than 90 percent of the desktop operating system and productivity suite markets. Google needs competition.
If you were looking to create a start-up, and particularly an open-source start-up, you could hardly do better than to stumble upon a pre-existing open-source project with millions of downloads, widespread adoption by some of the biggest names in the industry, and a fast-growing enterprise need.
Take Lucene, for example, as CMS Watch's Kas Thomas noted on Monday. It is a hugely popular project with one big failing: no enterprise support. Writes Thomas:
Lucene has a lot going for it...(It's) one of the safest (open-source projects) around, in terms of governance and oversight (through the Apache Foundation), the maturity of the code, the amount of active development going on, the size and vitality of the user ecosystem, and the number of high-traffic Web sites that have validated the technology in real-world applications (some better-known examples being Monster.com, Netflix, and Wikipedia).
Perhaps reflective of all this, Lucene has become a top-five Apache project, with 7,000 downloads a day.
But one thing Lucene is not is an out-of-the-box solution...To go from Lucene to a ready-to-deploy solution requires programming (and lots of it). And when you have a problem, there's no phone number to dial in the middle of the night. It's just you, the source code, and the community.
Enter Lucid Imagination, a commercial Lucene company that on Monday announced a $6 million Series A round of venture financing from Granite Ventures and Walden International, which also invested in SugarCRM.
Started by Eric Gries in 2007, the company already has a full roster of customers that includes Netflix, Hewlett-Packard, FedEx, Orbitz, AOL, Apple, Comcast, and Zappos, which sets it apart from Gries' former venture, Levanta (formerly Linuxcare), where he was CEO.
In fact, in talking with Gries several times over the past few months as a member of Lucid's advisory board, it became apparent to me that the Levanta experience may well prove to be one of the best reasons to be optimistic about Lucid, in addition to its stellar roster of engineers and Doug Cutting, the founder of Lucene, as an adviser.
Enterprise search is a growing market, and Lucene (and its more commercially friendly Solr brother) is keeping the pace. The question then becomes whether Lucid and Gries can provide enough value around Lucene to warrant companies such as Netflix spending big with Lucid rather than rolling their own Lucene-based search solution.
I think it can, because it's being run by people that have learned the hard way how to ensure open-source success. As an adviser to Lucid, I'm somewhat biased, and doubly so because my own company uses Lucene as part of our content management solution, so I've felt the power and pain of Lucene firsthand. But I believe that this is a space to watch and a company worth watching.
With Yahoo sporting a new CEO, Microsoft is likely to make a run at buying its search business again. The question, however, is whether it's simply too late for the software giant to make a credible bid to catch up to and surpass Google in paid search.
As suggested in an excellent, probing article in The Wall Street Journal, Microsoft, not Google, may well be the reason for Microsoft's virtual nonexistence in paid search:
The story of Microsoft's early missteps helps explain how Google became the uncontested leader in making money from Internet searches, and why Microsoft is trying so hard to make up for lost time. It also exposes a broader challenge facing Mr. Ballmer, as he guides his company of nearly 100,000 employees: how to foster groundbreaking technologies and businesses that are under his nose.
With investments into nearly every major area of software, Microsoft has plenty of innovative ideas and technologies. Its challenge is deciding which ones to nurture. But as Mr. Ballmer manages Microsoft without (Bill) Gates...he said the Keywords episode and similar missteps are at the front of his mind.
"The biggest mistakes I claim I've been involved with is where I was impatient--because we didn't have a business yet in something, we should have stayed patient," Mr. Ballmer said in an interview. "If we'd kept consistent with some of the ideas" that Microsoft had in-house in 1999, "we might have been in paid search."
Has anything changed? Historically, Microsoft has chased the wrong competitors, or the wrong competitive strategies, on the Web. As the Journal explains, Microsoft's early foray into Internet revenue models had it fixated on AOL and subscription-based content, rather than on search-based advertising. The same thing could be happening now.
In focusing on Google, is Microsoft neglecting the bonanza for vendors that monetize the conversations happening on Twitter, Facebook, and elsewhere?
Ballmer notes his lack of patience as the reason Microsoft struggled to make a dent in Google's search revenue, but ZDNet's Larry Dignan offers a more poignant reason: lack of focus.
While tied to impatience, it was almost certainly Google's make-this-work-or-go-under approach to search that helped it figure out search advertising. Microsoft makes billions of dollars on Office and Windows. That is why its best new product in years is SharePoint, which marries the two, and not MSN Live or other Web products, which are a distraction from Microsoft's core business.
Patience may be an attribute that Microsoft should develop, but I believe that focus and ambition are more important. Microsoft will struggle to truly focus on the Web while offline desktop and server revenues continue to pay the bills.
Cuil, the new and "improved" search engine created by Google veterans, has failed abysmally to make a dent against its alma mater, Google, according to TechCrunch. Clearly something other than a full-frontal assault is going to be needed to displace Google as the search leader.
But why is Google the search leader?
Tim O'Reilly points to Google PageRank as the "Google's breakthrough in search" that "quickly made it the undisputed search market leader." Maybe, but consumers don't think that way. My parents' use of Google actually has little to nothing to do with the quality of the search.
I'm not sure any of ours does, ultimately. I've spent the last two days tinkering with searches on Microsoft Live Search, Google, and Yahoo, and on a pure quality basis it's hard to tell the three apart. I'm sure some objective science could be made of Google's superiority, but that's not how people search. If you're looking for "table salt" on Google, how do you know that the results returned are better than those on Yahoo? Answer: you don't.
In fact, the times that I can't find something with a search engine have much more to do with the quality of my search terms than with the quality of the algorithms informing the search, and no search engine really helps much with prodding quality search terms. How could they?
Ultimately, then, I think we use Google out of habit, not superior search. For most of us, it's the search engine to which our trusted computer adviser pointed us, and we've never looked back. Why would we? Because we don't have any way of independently verifying that a competitor would give us better search results, there really is no justification for switching.
So, Google is a habit. But it's not one that Google is willing to lackadaisically take for granted. Instead, it is building all sorts of ancillary value (Gmail, Picasa, etc.) which by themselves provide little add-on revenue opportunity but ensure that when we search, we never have reason to look beyond Google, its cash cow.
All of which means that much as Google has learned from the disruptive Web, it has perhaps learned more from the desktop. Microsoft, king of the desktop, makes comparatively little from its businesses outside of Windows and Office, but all the add-on value ensures that the vast majority keep feeding its cash cows to the tune of billions in profits every quarter. Microsoft is a habit, too. People could fairly easily switch to Linux and OpenOffice, but they don't. The bother of change doesn't outweigh the ease of habit.
The only way to displace Google in search may well be to follow Apple's approach to displacing Microsoft on the desktop: change the game. Apple turned the desktop business into a creative/entertainment pursuit, blending the desktop (iLife suite of products, plus extensions of the desktop like the iPhone and iPod) with the cloud (iTunes, App Store). Apple has a long way to go, but it's taking market share from Microsoft at a respectable clip.
In other words, for competitors looking to kick the Google search habit, you can't take the Cuil route and compete on search. It just won't matter if you're better. You need to create a different, compelling habit.




