Out-Googling Google, a la Krugle
Laura Merling
Krugle has been silent for the past year. I was actually worried that the company had fizzled out, but--as I learned from Laura Merling today--nothing could be further from the truth:
We released the Enterprise product as GA, with 16 substantial companies (most are Fortune 100 or Fortune 500) that are now using it. No one will give us our evaluation search appliance back! Even those who are just evaluating our product refuse to drop Krugle once they've started using it. They've been providing their use cases to us to help us improve our services.
Krugle is doing well, in part, because it's done a good job of figuring out how to profit from open source, even when it, in itself, is not open source:
We have enjoyed the benefit of open source in several ways. First, our customers have found us through our public site or the Krugle open-source code search engine. Second, one of the main reasons we see a lot of early companies go to an open-source model for sales and marketing is that they get individuals and developers to leverage and try things quickly and easily.
While Krugle is built on things like Nutch, Lucene, Apache, Antler, etc., we do not sell in the open-source model. We have benefited from it, however, because we have the luxury of the power shift that open source created. The power for decision making and buying has shifted to the developers, architects and midlevel managers. We are in some very large companies and have not once talked to a CIO! Developers have paved the way with their newfound power.
So what does Krugle actually do? Search. Niche search in a very big niche:
We're shipping a Krugle search appliance. The customer points it at their code (across an entire enterprise spanning multiple code repositories) and indexes everything, making it all easily searchable. Krugle is important when you want to search across multiple programming languages and code repositories. This makes enterprise development collaborative and permeable, rather than silo'd and opaque. Search helps developers find the right code/information "just in time," (PDF) making them much more productive.
One great example of this is Krugle's soon-to-be-announced partnership with IBM to power code search on DeveloperWorks, one of the more innovative developer sites in the industry. What does this mean? It means that IBM will enable code search for every general search on DeveloperWorks, which covers every product IBM has in its portfolio. In addition to being available as part of the general search capabilities within DeveloperWorks, IBM has plans to include a button on every article page that allows for code search on the sample code associated with the article.
What is the scope of the project? According to Merling, through this agreement Krugle becomes the first and preferred hosted search engine to have access to IBM code files in its index. A lot of code is involved: Krugle has indexed more than 1,400 articles on DeveloperWorks, producing more than 29,000 source files with more than 4 million lines of code. The index includes code in more than 35 languages including C, XML and Java.
Clearly, IBM is betting big on Krugle. That's a serious stamp of approval, handing over 4 million lines of code for indexing.
For me, it's causing me to take a close look at Krugle again. The company went quiet for a while, but apparently not because it wasn't busy. With its Fortune 500 deals and this partnership with IBM, Krugle is gaining momentum, which is curious since this should have been Google's game to lose. When was the last time you heard of Google Code Search? Exactly.
With Krugle showing the way to make money in this market, perhaps it will wake the sleeping Google giant. Maybe the giant will buy Krugle. Stranger things have happened...
Matt Asay brings a decade of in-the-trenches open-source business and legal experience to The Open Road, with an emphasis on emerging open-source business strategies and opportunities. Matt is vice president of business development at Alfresco, a company that develops open-source software for content management. He is a member of the CNET Blog Network and is not an employee of CNET. Disclosure. You can follow Matt on Twitter @mjasay. 


Am I right in my understanding that Krugle?s search solution is still based on keyword recognition (like solutions of other players ? Google, FAST or Convera) and has to rely on taxonomies?
But what if the keyword is misspelled? Or if you do not know exactly what you are looking for, just have a vague notion?
There are other solutions out there that deal with this problem imitating the work of human brain (we don?t look for keywords, we look for patterns). For example, Brainware possesses a unique, patent-protected technology that sets it apart from other data capture and enterprise search solutions providers. Its products are powered by the world's only engine that does not rely on exact definitions to rapidly sift through mountains of unstructured data.
Brainware's technology allows it to recognize and find data through inexact definitions, patterns and context, mimicking the way the human brain processes and sorts information.
Here?s a case study showing Brainware in action:
Fulbright & Jaworski: Leading Law Firm Searches And Shares Knowledge Base Smarter, More Accurately
http://www.brainware.com/brain_case_lawfirm.php
http://altsearchengines.com/2007/07/13/top-10-alternative-code-search-engines-montage/
Charles Knight, editor
www.AltSearchEnines.com
- Re: Does Krugle rely on exact definitions?
- by kkrugler October 4, 2007 4:58 PM PDT
- Hi Yegor,
- Like this Reply to this comment
-
(4 Comments)Thanks for the reference to Brainware - interesting technology.
As to your question about how we search, there are two key things we do (at the lowest level) that differ from typical keyword search by Google, FAST, etc.
First, we do a fuzzy parse of the code and use the resulting syntax tree to automatically classify parts of the code. This lets us separate out comments from code, and regular statements from function calls, function definitions, and class definitions.
By doing so, we can provide better search results for general queries, and also let users perform more explicit searches.
Second, when we parse code, we use different tokenizers to help with extracting sub-terms. So searching on "context", for example, finds methods named getContext, set_next_context, and context-classifier.
It would be great to support natural language searches on code, so a programmer could enter something like "I'm looking for a fast implementation of the SHA1 algorithm in C, with full unit tests" and we only returning matching hits.
Unfortunately for most code, most of the time, the level of commenting is insufficient to extract this even with a human looking at the source. And programmatically figuring this out is one of several "holy grails" for code analytics tools, but we're nowhere close yet.
So in the meantime, we focus on what can be done now, using existing technologies that scale to the size we need (2.5 billion lines of code as of today).
But now I'll have to look at how we might use alternative technologies to do a better job of searching our 120K project descriptions, which are in English (or at least the kind of English that programmers use)...thanks!