• On mySimon: The North Face Mountain Sneakers for Men
September 27, 2007 3:41 PM PDT

Out-Googling Google, a la Krugle

by Matt Asay
  • Font size
  • Print
  • 4 comments

Laura Merling

Krugle has been silent for the past year. I was actually worried that the company had fizzled out, but--as I learned from Laura Merling today--nothing could be further from the truth:

We released the Enterprise product as GA, with 16 substantial companies (most are Fortune 100 or Fortune 500) that are now using it. No one will give us our evaluation search appliance back! Even those who are just evaluating our product refuse to drop Krugle once they've started using it. They've been providing their use cases to us to help us improve our services.

Krugle is doing well, in part, because it's done a good job of figuring out how to profit from open source, even when it, in itself, is not open source:

We have enjoyed the benefit of open source in several ways. First, our customers have found us through our public site or the Krugle open-source code search engine. Second, one of the main reasons we see a lot of early companies go to an open-source model for sales and marketing is that they get individuals and developers to leverage and try things quickly and easily.

While Krugle is built on things like Nutch, Lucene, Apache, Antler, etc., we do not sell in the open-source model. We have benefited from it, however, because we have the luxury of the power shift that open source created. The power for decision making and buying has shifted to the developers, architects and midlevel managers. We are in some very large companies and have not once talked to a CIO! Developers have paved the way with their newfound power.

So what does Krugle actually do? Search. Niche search in a very big niche:

We're shipping a Krugle search appliance. The customer points it at their code (across an entire enterprise spanning multiple code repositories) and indexes everything, making it all easily searchable. Krugle is important when you want to search across multiple programming languages and code repositories. This makes enterprise development collaborative and permeable, rather than silo'd and opaque. Search helps developers find the right code/information "just in time," (PDF) making them much more productive.

One great example of this is Krugle's soon-to-be-announced partnership with IBM to power code search on DeveloperWorks, one of the more innovative developer sites in the industry. What does this mean? It means that IBM will enable code search for every general search on DeveloperWorks, which covers every product IBM has in its portfolio. In addition to being available as part of the general search capabilities within DeveloperWorks, IBM has plans to include a button on every article page that allows for code search on the sample code associated with the article.

What is the scope of the project? According to Merling, through this agreement Krugle becomes the first and preferred hosted search engine to have access to IBM code files in its index. A lot of code is involved: Krugle has indexed more than 1,400 articles on DeveloperWorks, producing more than 29,000 source files with more than 4 million lines of code. The index includes code in more than 35 languages including C, XML and Java.

Clearly, IBM is betting big on Krugle. That's a serious stamp of approval, handing over 4 million lines of code for indexing.

For me, it's causing me to take a close look at Krugle again. The company went quiet for a while, but apparently not because it wasn't busy. With its Fortune 500 deals and this partnership with IBM, Krugle is gaining momentum, which is curious since this should have been Google's game to lose. When was the last time you heard of Google Code Search? Exactly.

With Krugle showing the way to make money in this market, perhaps it will wake the sleeping Google giant. Maybe the giant will buy Krugle. Stranger things have happened...

Originally posted at The Open Road
Matt Asay brings a decade of in-the-trenches open-source business and legal experience to The Open Road, with an emphasis on emerging open-source business strategies and opportunities. Matt is vice president of business development at Alfresco, a company that develops open-source software for content management. He is a member of the CNET Blog Network and is not an employee of CNET. Disclosure. You can follow Matt on Twitter @mjasay.
Recent posts from News Blog
Nvidia puts NForce chipset development on hold
Opera 10 browser is here
Neil Young Archives Blu-ray: Rip off?
Acronis revises survey results about backup habits
Acronis miscalculates data on users' bad backup habits
Flickr co-founder presses beta button
Comcast, Sony open retail store
Cox to try coaxing the Internet into submission
Add a Comment (Log in or register) (4 Comments)
  • prev
  • 1
  • next
Code Search
by royrusso September 27, 2007 7:27 PM PDT
On that note.... a fascinating website I've seen evolve over the years is http://www.koders.com/ I would recommend it over any other code search engine, I've seen thus far.
Reply to this comment
Does Krugle rely on exact definitions?
by Yegor Kuznetsov September 28, 2007 5:32 AM PDT
Matt, great story on Krugle!

Am I right in my understanding that Krugle?s search solution is still based on keyword recognition (like solutions of other players ? Google, FAST or Convera) and has to rely on taxonomies?

But what if the keyword is misspelled? Or if you do not know exactly what you are looking for, just have a vague notion?

There are other solutions out there that deal with this problem imitating the work of human brain (we don?t look for keywords, we look for patterns). For example, Brainware possesses a unique, patent-protected technology that sets it apart from other data capture and enterprise search solutions providers. Its products are powered by the world's only engine that does not rely on exact definitions to rapidly sift through mountains of unstructured data.

Brainware's technology allows it to recognize and find data through inexact definitions, patterns and context, mimicking the way the human brain processes and sorts information.

Here?s a case study showing Brainware in action:
Fulbright & Jaworski: Leading Law Firm Searches And Shares Knowledge Base Smarter, More Accurately
http://www.brainware.com/brain_case_lawfirm.php
Reply to this comment
The Top 10 Code Search Engines
by CharlesSKnight September 28, 2007 8:30 AM PDT
On AltSearchEngines.com, the new blog from Read/WriteWeb that covers hundreds of alternative search engines, we recently did a post on our Top 10 Code Search Engines (which includes Krugle and Koders and eight more):

http://altsearchengines.com/2007/07/13/top-10-alternative-code-search-engines-montage/

Charles Knight, editor
www.AltSearchEnines.com
Reply to this comment
Re: Does Krugle rely on exact definitions?
by kkrugler October 4, 2007 4:58 PM PDT
Hi Yegor,

Thanks for the reference to Brainware - interesting technology.

As to your question about how we search, there are two key things we do (at the lowest level) that differ from typical keyword search by Google, FAST, etc.

First, we do a fuzzy parse of the code and use the resulting syntax tree to automatically classify parts of the code. This lets us separate out comments from code, and regular statements from function calls, function definitions, and class definitions.

By doing so, we can provide better search results for general queries, and also let users perform more explicit searches.

Second, when we parse code, we use different tokenizers to help with extracting sub-terms. So searching on "context", for example, finds methods named getContext, set_next_context, and context-classifier.

It would be great to support natural language searches on code, so a programmer could enter something like "I'm looking for a fast implementation of the SHA1 algorithm in C, with full unit tests" and we only returning matching hits.

Unfortunately for most code, most of the time, the level of commenting is insufficient to extract this even with a human looking at the source. And programmatically figuring this out is one of several "holy grails" for code analytics tools, but we're nowhere close yet.

So in the meantime, we focus on what can be done now, using existing technologies that scale to the size we need (2.5 billion lines of code as of today).

But now I'll have to look at how we might use alternative technologies to do a better job of searching our 120K project descriptions, which are in English (or at least the kind of English that programmers use)...thanks!
Reply to this comment
(4 Comments)
  • prev
  • 1
  • next
advertisement

A CNET Conversation with Eric Schmidt

CNET's Tom Krazit and Molly Wood sit down with Google CEO Eric Schmidt to discuss the future of Android, the Chrome OS, the problem of real-time search indexing, and more.

Verizon tests sending RIAA copyright notices

The No. 2 phone company, known for its reluctance to intervene in antipiracy cases, strikes an agreement to forward copyright notices on behalf of the music industry.

About News Blog

Recent posts on technology, trends, and more.

Add this feed to your online news reader

advertisement
advertisement

Inside CNET News

Scroll Left Scroll Right