• On mySimon: Bacon Soap
August 31, 2009 5:23 AM PDT

Why would a Googler use Solr for search?

by Matt Asay
  • Font size
  • Print
  • 9 comments

Google is arguably the world's largest open-source company, not only releasing a minimum of 14 million lines of open-source code but also hosting over 250,000 open-source projects on Google Code, in addition to its open-source advocacy work like Summer of Code.

Despite these open-source bona fides, it's still surprising to see someone at Google adopting Solr, an open-source search server based on Apache Lucene, for its All for Good site.

Google is the world's search market leader by a very long stretch. Why not use its own search technology? Why use Solr?

Google's Public Sector team suggested an answer last week:

One of the top concerns we've been hearing from nonprofit organizations who list volunteer opportunities on All for Good is that their opportunities aren't updated on the site as frequently as they need. This happens because...we crawl feeds from partners like VolunteerMatch and Idealist just like Google web search crawls web pages. Crawlers don't immediately update, they take time to find new information.

Today, we're rolling out improvements to All for Good that will help solve this problem and improve search quality for users. The biggest change, which you won't see directly, is that our search engine is now powered by SOLR, an incredible open source project that will allow us to provide higher quality and more up-to-date opportunities. Nonprofits should start seeing their opportunities indexed faster, and users should see more relevant and complete results.

I don't think this means that Google thinks Solr provides better results than its own code. Rather, I suspect this was simply a case of a Googler using her 20 percent "free" time to get a job done. It was likely easier to roll a service using Solr than to get official approval from Google to use its search technology for an important but nonprofit purpose. (My request for comment by Google had not been answered at the time of this post's publication.)

To me, this says much about the power of Google's culture: Googlers appear to be unfettered to use the best tool to get a job done, which may not always be the best technology, per se, but simply the most easily available technology for a given project at a given time.

The decision also says a tremendous amount about the value of open source, and of Solr in particular. If it's good enough for Google, as David Fishman notes, it's probably going to be just fine for you, too.

Update, 11:41 a.m. PDT: I heard back from Chris DiBona, open source and public sector program manager at Google, who offered this reasoning behind the move, in response to the suggestion that Google uses Solr:

I think you meant "Googler chooses Solr." You see, Allforgood.org is run by Our Good Works, a non-profit that works with technology companies and the whitehouse on that site. I'm on the Board of OGW, but it is run by Jonathan Greenblatt.

That said, we chose Solr because it made sense for the project we had. We want other companies/countries to be able to use the code we've written for Allforgood.org and to have it depend too heavily on Google Base precluded that, but specifically, technically speaking Solr fit the problem better than Google Base did.

So, it's not accurate to say that "Google chose Solr," but it is accurate to suggest that All for Good was founded by Googlers in their "20-percent time" and continues to be hosted by Google, as TechCrunch has reported, and that those Googlers, along with the rest of the board, opted for Solr over Google.

As DiBona mentions, and as I blogged above, this is a reflection of fit-for-purpose, and not any problem with Google's code. All for Good is completely open source, so it makes sense that it would opt for open-source Solr over Google Base.


Follow me on Twitter @mjasay.

Matt Asay brings a decade of in-the-trenches open-source business and legal experience to The Open Road, with an emphasis on emerging open-source business strategies and opportunities. Matt is vice president of business development at Alfresco, a company that develops open-source software for content management. He is a member of the CNET Blog Network and is not an employee of CNET. Disclosure. You can follow Matt on Twitter @mjasay.
Recent posts from The Open Road
Your new software vendor? Domino's Pizza
The 'wisdom of crowds' loses steam
Microsoft's embrace of MySQL could kill it
Apple: 'Enterprise' is as enterprise does
Theory of competition fails in open source, elsewhere
Microsoft's Web business spurring development of IE
The case for the open-source Goliath
Netherlands' open-source policy goes double Dutch
Add a Comment (Log in or register) (9 Comments)
  • prev
  • 1
  • next
by FellowConspirator August 31, 2009 6:19 AM PDT
Well, for starters, the algorithm that Google uses, their "PageRank", is inappropriate for the application. "PageRank" takes into account such aspects as the click-through rates on search results, frequency of external references, etc. and they're based on whole-text. In the case of Solr and Lucene, the fitness (in the evolution sense) portion of the algorithm isn't there since that would defeat part of the purpose. On top of that, the information has a regular structure that's broken down into individually indexed sub-fields, some with their own ontologies and thesauri.

Google's search engines is very good at what it does, but in this case, the objective is to do something different. It's what we in the industry call "using the right tool for the job".
Reply to this comment
by compbry15 August 31, 2009 7:28 AM PDT
I agree with FellowConspirator. This was a pretty lame article, in my opinion. Solr and Google's PageRank are two very different beasts, both excelling at their own strengths.

Saying "Googlers appear to be unfettered to use the best tool to get a job done, which may not always be the best technology, per se, but simply the most easily available technology for a given project at a given time." is not accurate because that is assuming that Google's software is the best for all situations. This is simply not the case.
Reply to this comment
by ajayg514 August 31, 2009 8:08 AM PDT
My confusion is: "is that their opportunities aren't updated on the site as frequently as they need."

Who's site? All for Good? Google Search? Adsense? Craigslist? Google List?

In terms of Google Search:

The concern seems to be about updates, and from what I know, pagerank, lastmod, sitemaps are things you can do to help with the updating. What else could help the "for-profit" websites? Should be interesting to see how this affect search engine results...
Reply to this comment
by ajayg514 August 31, 2009 8:15 AM PDT
Also, at my website I have found with higher traffic pages, the updating from google occurs a lot more frequently. [CNET editors' note: URL removed.]
Reply to this comment
by IncredibleMouse September 1, 2009 2:15 AM PDT
I think they chose it because the name Solr is obviously handicapped with its missing letter and all. I'm having a hard time saying the word vs spelling it like an acronym.
Reply to this comment
by shadfurman September 2, 2009 3:59 PM PDT
I do like how Google does not try and dictate control over many of its projects. I often see Google contrasted with Microsoft, but I see these companies as being very similar in many aspects. I would contrast Google with Apple, Apple maintains strict control over pretty much 100% of it's products and is hyper focused on only a few projects. I can't say that Googles method is better really, because of Apples strict control they do maintain a fantastic public image and release some awesome sexy products, but I do LIKE Google a lot more as a company.
Reply to this comment
by getmsalaryfromboxalino1 October 29, 2009 5:37 AM PDT
How to get our salary from boxalino,
this company couldn't found in china,
Sylvain.Paillard@boxalino.com is manger of boxalino(beijing)
is he cheater?
Reply to this comment
by getmsalaryfromboxalino1 October 29, 2009 5:37 AM PDT
How to get our salary from boxalino,
this company couldn't found in china,
Sylvain.Paillard boxalino.com is manger of boxalino(beijing)
is he cheater?
Reply to this comment
by getmsalaryfromboxalino1 October 29, 2009 5:39 AM PDT
How to get our salary from boxalino,
this company couldn't found in china,
Sylvain.Paillard boxalinodotcom is manger of boxalino(beijing)
is he cheater?
Reply to this comment
(9 Comments)
  • prev
  • 1
  • next
advertisement

Let the battle for holiday gadget shoppers begin

Retailers try different strategies for competing with behemoths like Amazon and Wal-Mart in the cutthroat competition to lure those giving electronics as gifts.

Firefox hopes to one-up IE with fast graphics

Windows 7 features called Direct2D and DirectWrite will speed up Internet Explorer 9 performance. But Firefox hopes it might retool for the same benefit first.

advertisement

About The Open Road

Matt Asay brings a decade of in-the-trenches open-source business and legal experience to the Open Road, with an emphasis on emerging open-source business strategies and opportunities. Matt is general manager of the Americas division and vice president of business development at Alfresco, a company that develops open-source software for content management. He is a member of the CNET Blog Network and is not an employee of CNET. Disclosure.

Add this feed to your online news reader

The Open Road topics

advertisement
advertisement

Inside CNET News

Scroll Left Scroll Right