"Database" has come to be largely synonymous with a relational database management system (RDBMS) or, more specifically, a relational database that is accessed using the SQL query language. Some simpler products run on desktops, but if you are talking about products used for serious business computing on a server, SQL it is. The widespread adoption of open-source products such as MySQL and PostgreSQL only cemented SQL's dominance by making it available to a broad audience that couldn't afford licensing fees for products from Oracle and other large database vendors.
An RDBMS stores data in the form of multiple tables that are related to each other by keys that are unique among all occurrences in a given table. The "relational database" term was originally defined and coined by IBM's Edgar Codd in a 1970 paper. Products based on this database model came to largely replace a variety of hierarchical and other technology approaches. While it could be lower performance than alternatives, it tended to offer more flexibility in how data could be laid out, added, and accessed.
As computer systems got faster (and SQL RDBMSs were enhanced in many ways), concerns about the performance of the basic approach largely receded into the background. In general, efforts to displace RDBMSs--such as object databases--have ended up possibly generating a lot of hype for a time but have stayed very much in the niches.
However, with the advent of truly massive scale distributed computing infrastructures, we're starting to see the significant adoption of technologies that don't necessarily replace RDBMSs, but certainly complement them.
The basic issue is that RDBMSs are architected to process and store all transactions with absolute reliability. (ACID--atomicity, consistency, isolation, and durability--is a set of properties commonly used to describe the requirements.) This is a good thing when we're talking about, say, financial transactions. A bank balance has to immediately reflect a withdrawal; the system has to prevent multiple withdrawals of the same balance from happening simultaneously.
RDBMSs and their associated infrastructure also tend to reflect the assumption that data will be retained for a significant period. Again, this makes a lot of sense in the context of the traditional role of databases. A business not only wants to keep transaction records for at least several years--in many cases, it's legally required to do so.
However, we're seeing the increased use of alternative approaches in large distributed systems that don't have as stringent consistency requirements or that generate lots of intermediate results that don't need to be stored permanently. In exchange, they can use replication for maximum performance and availability.
One form this takes is "eventual consistency," which Amazon CTO Verner Vogels describes as tolerating inconsistency for "improving read and write performance under highly concurrent conditions and handling partition cases where a majority model would render part of the system unavailable even though the nodes are up and running." You can read a paper Vogels wrote on the topic here.
Amazon SimpleDB implements such a model. It "keeps multiple copies of each domain. When data is written or updated (using PutAttributes, DeleteAttributes, CreateDomain or DeleteDomain) and Success is returned, all copies of the data are updated. However, it takes time for the update to propagate to all storage locations. The data will eventually be consistent, but an immediate read might not show the change."
We're also seeing products that essentially augment RDBMSs by reducing the volume of data that they need to store. Terracotta is a commercial product that provides distributed caching for Java applications. An example could be a travel reservation application where the actual "books" need to go into an RDBMS but many of the transactions associated with "looks" can be handled in a distributed way without touching the database every time. Terracotta says that they can frequently offload 40 percent to 60 percent of transactions.
Memcached, an open-source distributed memory caching system, is conceptually similar. It distributes data (together with an associated structure to lookup that data) across multiple systems to reduce accesses to external data stores. It is widely used at large Web sites such as Twitter, YouTube, and Wikimedia.
These techniques and technologies don't replace RDBMSs in the way that RDBMSs replaced older technologies such as hierarchical databases. Rather, they trade off characteristics that have been considered non-negotiable must-haves in the realm of database design such as full consistency. As a result, they can't be used instead of RDBMSs for the situations where those characteristics truly are requirements.
However, a lot of software that is more asynchronous and read-intensive than traditional business applications doesn't have the same constraints on the one hand and needs to massively scale performance across many systems on the other. And for the organizations implementing that software, pairing RDBMSs with distributed data stores of various forms isn't just the right architectural approach; it may be the only way they can get to the scale levels they need at a price point that makes business sense.
On the heels of yesterday's Steve Jobs keynote at Macworld, Apple may be the tech company that's top of mind for many readers. However, from an enterprise computing perspective, Sun Microsystem's announcement that it is acquiring MySQL is far more pertinent. News.com's Martin LaMonica summarizes the announcement thusly:
Sun Microsystems will pay $1 billion to buy MySQL, the provider of a popular open-source database.
Sun said Wednesday that it will pay about $800 million in cash for MySQL's stock and take on about $200 million worth of options. MySQL CEO Marten Mickos will join Sun's senior executive team after the transaction closes.
The acquisition is a bold move for Sun, which has embraced open-source software and development practices in an effort to garner more revenue from its software business. Until now, it has sold support services for a competing open-source database, PostgreSQL.
MySQL is one of the most successful open-source companies founded in the past five years. It's part of the popular combination of open-source development products referred to as LAMP, for Linux Apache Web server, MySQL, and the PHP development language, which is broadly used on the Internet and within companies.
Here, I wanted to focus in on one specific implication.
MySQL is the clear category leader in open-source databases; it's the "M" in the LAMP stack that also includes Linux, the Apache Web server, and the Python, Perl, and PHP scripting languages. And LAMP underpins a huge portion of the open-source software world. As a result, MySQL--like JBoss before it was acquired by Red Hat--made a nice little business of selling support subscriptions for its software. Indeed, it was one of the more successful of the more-or-less pure standalone open-source companies.
If that sounds like damning with faint praise, it is a bit. Because so few end users tend to buy support contracts relative to the number of people that use the product, pure open source has been a challenging business model for its practitioners. That's not to say that there aren't companies successfully taking such an approach, but there are no pure open-source Oracles, Microsofts, or VMwares raking in the dough.
Small software companies get bought by larger companies all the time of course. Open source or not, enterprise customers often appreciate the sort of global support that large vendors are better prepared to offer. And the ability to put together sets of products that address broad business problems is more appreciated. However, in the case of open source specifically, the fact that a large vendor can leverage open-source products to sell other software and even hardware creates far more revenue opportunities than when the only thing a company can sell is a support contract on a single piece of software.
In the Sun and MySQL case, for example, one can imagine Sun eyeing the vast population of MySQL users not so much for the opportunity to sell MySQL support contracts but as an entree for selling other Sun middleware, Solaris, and Sun hardware. One can imagine a conversation like this repeated many times: "Oh, you need better performance out of MySQL running on Linux? Of course, we're happy to help. But you might think about Solaris because we have this DTrace tool. We also have this ZFS file system. And, oh, have you heard about Thumper?"
It's not so much that there aren't workable business models around open source. But life is so much easier when those models can include pieces that people have to pay for as well.
- prev
- 1
- next





