One of the cloud-related trends that developers have been paying attention to lately is the idea of "NoSQL," a set of operational-data technologies based on nonrelational technology.
These technologies do not replace the relational database but rather add a new tool to the developer toolbox. Business intelligence database technologies such as Aster Data, Greenplum, Neteeza, and Vertica do not completely replace the traditional relational database but rather use nonrelational databases to augment the software.
RedMonk analyst Stephen O'Grady wrote recently that NoSQL "adoption was inevitable because, just as in every other walk of life, there are different tools for different jobs in the technology world." NoSQL may not be exactly the right moniker, but the companies and developers behind these tools have legitimate substantiating points as to why the approach is right.
According to Dwight Merriman, CEO of 10gen (the commercial team behind the open-source MongoDB project), we'll see NoSQL complement existing applications for the foreseeable future.
The broad range of NoSQL tools that include projects like Cassandra, CouchDB, Hadoop, Memcached, and MongoDB bring to bear a number of technical advantages--even if no one tool does everything.
Horizontal scalability
Horizontal scalability, readily achievable for NoSQL solutions, fits incredibly well with cloud computing and general trends in computer architecture--toward more CPU cores rather than faster ones.
Performance
In some cases, the simplification of design of these solutions, as well as lack of normalization of the data, yields better performance. This often results in the developer not coding around the database.
Ease of assembly
Some NoSQL solutions facilitate easier software development. Mapping object data to JSON, a JavaScript data interchange format, is far less complex. The "schemaless" nature of many of these products is an excellent fit with agile development methodologies.
The typical software system of moderate complexity has many real and conceptual internal data stores. No one technology will be the right solution for all problems.
Forward-looking organizations should look at which technologies are appropriate for different data subsystems and begin to evaluate NoSQL technologies for appropriate projects.
Most businesses seek competitive advantage through some kind of change. Whether they want to beat the competition to market with a new service or introduce new product categories, disruption is the norm.
The challenge in today's IT-centric world is that every one of those disruptions requires a software change, introducing the potential for downtime and lost revenue.
Change control and the associated risk mitigation is a big problem that every large organization faces. Last year, the London Stock Exchange crashed during a software change and was down for more than seven hours, costing traders millions, if not billions of dollars in lost business. This year we've had high profile outages at Salesforce.com, Twitter, and Amazon's EC2, among others, affecting tens of millions of people.
No company is immune to this type of risk and companies that want to stay on the leading edge need to embrace these changes in order to stay competitive.
Coverity, a software integrity firm perhaps best known for its SCAN project of open-source software sponsored by the Department of Homeland Security thinks it has the preventive medicine to help organizations avoid the inevitable errors, defects, and failures that software change can introduce.
The company's latest release, Coverity 5, promises to mitigate the business risk of software changes across an organization's entire software portfolio. It claims this is the first product that lets developers automatically map and identify how a single defect impacts multiple code bases, projects, and products. Through a unified defect management interface, it also can help organizations review, prioritize and triage their C/C++, Java and C# defects in a single work flow.
This approach lets an organization quickly answer five key questions of software change management:
- How do I find defects introduced by changes?
- How do I know the severity of new defects?
- How do I know the impact to my code, my projects, my products?
- How do I fix them fast?
- How do I know I fixed them?
Today, market opportunities are changing faster than businesses can deliver. When your organization changes software, how quickly can answer the five questions above?
The Tennessee Valley Authority is the nation's largest public power provider serving approximately 9 million consumers in seven southeastern states. The organization also happens to be a big supporter of open-source projects, including Hadoop, a tool designed for deep analysis and transformation of very large data sets.
Earlier this year, the Tennessee Valley Authority (TVA) announced that it open sourced its data system used to collect data from smart grid devices called Phasor measurement units (PMUs). The data collection system is known in the industry as a Super Phasor Data Concentrator (SuperPDC), which can be used to determine the health of a power grid.
The open-source version of the SuperPDC is now called the "OpenPDC." I spoke to both Ritchie Carroll (RC), the project's creator, and Josh Patterson (JP), the person responsible for introducing Hadoop to the project, to discuss what the OpenPDC is and why TVA turned to Hadoop in building the system.
What sort of data volumes are you working with?
RC: Currently there is around 20 TB of archived data, we expect this to grow quickly as a result of the SmartGrid stimulus funding which includes the addition of 850 phasor measurement devices. This may well grow the archive to half a Petabyte within the next few years.
How is this data currently captured and managed? Is any data discarded?
JP: Data is collected directly from field devices at 30 times per second. This data is then time-aligned and processed in real-time--all data gets captured into a binary data file as time-series data for mass processing by Hadoop.
RC: No data is currently discarded, if we get to the point of needing to discard data because of cost--this will be a decision based on weighed importance of collected data. It is likely the data around major events will never be deleted because it will always be valuable for future student researchers. There is also value in being able to go back in time and look for newly discovered event signatures to see how long they might have been occurring.
... Read moreInfluence in open-source development communities is earned through years of writing and sharing great code. Perhaps not surprisingly, then, influence in the business side of open source is also gained through sharing expertise, and not necessarily from making mountains of cash.
At least, that's the lesson I take away from MindTouch's inaugural survey of 50 open-source business executives. MindTouch, an open-source collaboration company, has spent the last few months surveying executives within the commercial open-source community, asking them to name the most influential people within the commercial open-source ecosystem.
The result is effectively an all-star list of open-source business executives. The top five are as follows:
- Larry Augustin, CEO, SugarCRM
- Matt Asay, vice president of business development, Alfresco (and fellow CNET blogger)
- Mårten Mickos, entrepreneur-in-residence, Benchmark Capital, and former CEO, MySQL
- Jim Whitehurst, CEO, Red Hat
- Dries Buytaert, co-founder and CTO, Acquia
The full list is available here.
The common theme running through these top-five vote getters is how open they've been with their peers. Larry Augustin sits on several boards of open-source companies, but he also frequently speaks at industry events and has been involved in open source from its inception.
Matt Asay, my friend and fellow CNET blogger, sits on more than 10 open-source advisory boards, chairs the Open Source Business Conference, hosts an informal get-together every year (called Open Source Goat Rodeo--don't ask why), blogs at an unhealthy rate for CNET on open source, and has actively helped a range of aspiring open-source entrepreneurs understand the mechanics of running an open-source business.
Mårten Mickos made the world safe for the $1 billion open-source acquisition, but he has also traveled the globe speaking at open-source events and is very generous with his time, sharing know-how and best practices with other open-source executives.
Jim Whitehurst, breaking the typical Red Hat mold, has been active in industry events, has hosted a range of dinners and other small-scale, intimate events with open-source executives. He is amazingly accessible, given that he has a fast-growing open-source company to run. It's unfortunate that Whitehurst is the only Red Hat executive to make the list; Red Hat should follow his lead and be more permeable to its peers. Its influence would grow accordingly, just as Whitehurst's has.
Finally, there's Dries Buytaert, who blogs frequently on his project, Drupal, but also regularly attends and speaks at industry events. He has also been active behind the scenes, working with other open-source companies to share information on how to optimize community development.
Open-source code becomes valuable when you give it away. The same holds true for open-source business expertise. There are individuals who have made more money than these with open-source software, but in terms of influence, the more you share, the more influential you become.
What do you think? Who else should be on the list? Who influences you?
If you need further proof that open-source applications are ready for prime time, take today's news from open-source business intelligence company Jaspersoft, which announced that British Telecom is using its business intelligence suite to support more than 8 million voice mail subscribers.
BT and Unisys, a longtime Jaspersoft partner, say they chose Jaspersoft for its modular design, which reduces maintenance and cost and gives them customization abilities that improve capacity planning.
The deal with BT also represents how important a solid channel strategy is for open-source software companies.
Jaspersoft CEO Brian Gentile has in the past mentioned that the BI market is heavily influenced by a few technical aspects, including SOA/Web services (and overall componentized design), in-memory analytics, integrated search, and the use of rich media services to provide more compelling (Web-based) user experiences.
The other obvious factor in the shift to open-source BI (and open source in general) is the economics behind the applications and ongoing operations. And perhaps more important is the control--both on-premise and online. As consultant Carlo Daffara noted recently, "the critical aspect is being able to assess this control and weight if the lack of control is compensated by the features you get (which is reasonable) or what kind of risk you are accepting in exchange."
In conversation earlier today, Gentile further asserted, "open-source software is both augmenting and displacing aged, proprietary solutions across industries and at the largest companies. British Telecom is just one example of a company that has realized traditional, proprietary software is just too expensive and too complex. The most aggressive companies figured this out long ago. But now, with heightened economic pressures and the feature maturity of open source, the secret is out and the choice is clear."
There was a time when people would debate whether or not open-source software was reliable enough to support a small office. Those days are long gone. The down economy and maturity of open source are the perfect storm for major disruption.
MySpace today announced a new open-source project called Qizmt, a distributed computation framework developed by its data mining team.
Qizmt is based on the MapReduce distributed processing framework, well-known as a core part of Google's search indexing infrastructure. Qizmt, however, runs on large clusters of Microsoft Windows servers, an interesting sidebar to a computing style we most commonly associate with commodity Linux machines.MapReduce is a programming model and an associated implementation for processing and generating large data sets. Users specify a map function that processes a key/value pair to generate a set of intermediate key/value pairs, and a reduce function that merges all intermediate values associated with the same intermediate key.
I spoke with Java architect and distributed systems expert Eugene Ciurana about MapReduce and he contends that "indexing large amounts of unstructured data is a difficult task regardless of the technologies involved. MapReduce provides a simple, elegant solution for data processing in parallelized systems."
As more sites move to manage large data sets, the uptake of frameworks like MapReduce and projects like Hadoop is sure to grow. And along with the growth of the data is the growth of the market opportunity. Open source is a great way to expand and enlarge the adoption curve as users figure out the best way to use these new tools.
Qizmt is currently being used in the MySpace "People You May Know" feature, and will soon expand to user recommendations and other new areas.
Follow me on Twitter @daveofdoom.
Hadoop is the popular open-source implementation of MapReduce, a powerful tool designed for deep analysis and transformation of very large data sets. It enables you to explore complex data, using custom analyses tailored to your information and questions. It's also one of the most buzz-worthy, talked about open-source projects around.
I spoke with Christophe Bisciglia, Hadoop World organizer and founder of Cloudera, to ask some questions about this inaugural event. And by the way, if you're interested in attending, click on the link in the answer to question No. 5. (My readers get a 25 percent discount if you register before September 15.)Q: How can you explain the buzz around Hadoop? It's deafening.
... Read moreIn a new study on open-source adoption in the business intelligence (BI) market, it's becoming clear that both the benefits and shortcomings of open source software are nearly universal across all technology segments.
According to the study by Third Nature (sponsored by Jaspersoft and Infobright), "the top reason for adopting is still cost savings, although reduced vendor dependence and ease of integration were close to the same level. The limiting of vendor technology lock‐in and freedom from deployment restrictions were key elements of reducing vendor dependence. Some companies used open source deployments as a means of keeping their incumbent vendors honest."
The statement above is hardly unique to BI, but is perhaps germane if only because BI solutions have for so long been hugely expensive and proprietary. In past discussions with Jaspersoft CEO Brian Gentile, he has stated that BI is the least agile piece of the enterprise puzzle. Open source BI solutions mean that customers can take matters into their own hands.
The study also makes some recommendations on evaluating BI and data warehousing tools, that again are relevant for any open source product.
- Don't focus solely on cost savings.
- Make open source the default option
- Plan to augment, not replace, existing software with open source.
- Consider developing open source policies.
- Evaluate open source like any other software.
In the end, software needs to solve business problems. The adoption of open source gives users more alternatives to address their issues, be it cost reduction, increased business agility or just a new way to manage their data.
Follow me on Twitter @daveofdoom.
The Linux Foundation recently released an updated study of Linux development statistics that reveals interesting statistics relating to who actually writes the kernel that allows others to build on top.
More than 70 percent of total kernel contributions come from developers working at large companies including obvious participants like Red Hat, IBM, Novell, and Intel as well as other less obvious small companies such as Parallels.
- Red Hat: 12.3%
- IBM: 7.6%
- Novell: 7.6%
- Intel: 5.3%
- Independent consultant: 2.5%
- Oracle: 2.4%
- Linux Foundation: 1.6%
- SGI 1.6%
- Parallels 1.3%
- Renesas Technology: 1.3%
- Academia: 1.2%
- Fujitsu: 1.1%
- MontaVista: 1.1%
- MIPS Technologies: 1.1%
- Analog Devices: 1.0%
- HP: 1.0%
Another interesting fact is the rate of development and constant refactoring of the kernel code. An average of 10,923 lines of code are added with an average of 5,547 lines removed every day, ensuring that the code is high quality and relevant for the most important implementations of the kernel.
... Read moreDanc at the Lost Garden blog has written up an excellent analysis of why Flash games are great, but represent "the ghetto of the game development industry" in terms of revenue generation.
Compared to the number of players it serves, the Flash game ecosystem makes little money, launches few careers, and sustains few developer owned businesses.
There is too much reliance on advertising and not enough on sustainable paid methods, or "offers" such as subscriptions, in-game consumables, and level un-locking to encourage people to pay--and create an actual business.
There is no need to limit yourself to any single one revenue stream. There are lots of different types of players and each player values something differently. Some players may be willing to buy a t-shirt. Others may want 5 stackable subscriptions. Others may just want a pretty new character with a panda head. When you restrict your game to a single revenue source, you miss out on gaining money from all the different types of customers that would have paid you if you had just given them the right offer.... Read more





