Version: 2008
  • On The Insider: Britney's Bikini-Clad Top 10

May 20, 2005 12:48 PM PDT

Perspective: Rethinking the relational database

See all Perspectives
Rethinking the relational database
The relational database so dominates the thinking of information technology and business professionals that its presumed suitability for essentially all data management tasks is rarely questioned. But it?s time to revisit that conventional wisdom.

To be sure, this preeminent position has been well earned, since relational database management systems (RDBMS) provide sophisticated development tools, capabilities for handling frequently changing information, support for a large number of concurrent users, and many other features.

However, the character of much of the data generated by businesses today does not match the strengths of the RDBMS in virtually any respect. This mismatch is revealed within the context of Information Lifecycle Management, or assessing the handling of data from the time of its creation to its obsolescence. ILM is rapidly gaining favor within enterprise IT departments as an effective approach for coping with rapidly growing volumes of corporate data.

The time is right to rethink how to deal with the looming explosion in data volumes.

Consider two of the hot-button IT issues on the top of everyone?s list--the requirements of RFID and Sarbanes-Oxley. From a raw data perspective, they have a great deal in common with other less pervasively covered IT challenges, such as mobile service carriers? call data records or manufacturers? bill-of-material information.

The huge volumes of data these sources generate are related to past business events. This category of data possesses three key characteristics:

1) The data records occur at high transaction rates (usually from automated sources), resulting in a large volume of stored data.

2) The data records never change once they are created.

3) The data records must be saved primarily for historical-reference purposes and will be infrequently (if ever) accessed.

Two weeks of mobile call records easily fill a database to four terabytes (four thousand gigabytes) or more, and this volume will be multiplied by ten- or twenty-fold for so-called "3G" mobile networks. In the case of RFID records, major retailers and distributors are expected to generate between tens of terabytes to, by some incredible estimates, millions of terabytes of these records daily.

Herein lies the mismatch. Relational databases--with their transactional, dynamic and multi-user features--come with functionality that far exceeds what's needed for simply storing and accessing write-once/read-maybe business data. This excess functionality requires sizable hardware and software investments that grow in proportion to the amount of data handled. With costs easily in the seven-figure range, even the most well-funded datacenter would have a difficult time spending its way out of this problem.

The answer likely resides in pairing the RDBMS with a complementary technology that is particularly suited to the demands of capturing and storing large volumes of this write-once data. Ironically, a technology previously destined for the history books may well fit current and future requirements perfectly: the flat file.

Long relegated to application-embedded databases and desktop programs, a flat file that borrows a key feature from the relational database--the index--meets all of the requirements previously described for digital-business event data.

In databases, an index speeds up query access to large volumes of data by providing an entry for each field (such as username, phone number, etc.) and the location of the specific matching record(s). Applying an index to a flat file results in a very accessible repository--much more accessible than a tape library--that can respond quickly to enterprise reporting needs. Further, it can do so using comparatively modest server hardware. Coupled with the ever-decreasing cost of disk-based storage, using a flat file becomes a highly cost-effective approach.

Equally important, moving large volumes of business event data from the RDBMS to a complementary flat-file-based solution enhances the performance of the RDBMS for the tasks it's meant for. At the same time, this approach also delivers on the promise of ILM by putting the right data in the right place for the right cost without sacrificing support for the business.

The time is right to rethink how to deal with the looming explosion in data volumes. The relational database is an impressive technology, but it is also the most expensive way to store large volumes of static data simply to provide for potential access some time in the future.

It is a frequently referenced fact that 80 percent of the data stored in relational databases is never accessed once it is written to the database. For digital business event data, this percentage will be much higher and we know right now that it is unlikely the records will ever be accessed once they are written, let alone be altered by multiple users.

Simply put, the relational database is too much hammer for the digital business-event nail.

Biography
Kate Mitchell is CEO of CopperEye. She previously served as senior vice president for marketing and development at SeeBeyond Technology.

More Perspectives

CONTINUED: ...
Page 1 | 2

See more CNET content tagged:
RDBMS, terabyte, data management, RFID, volume

Add a Comment (Log in or register) (32 Comments)
  • prev
  • 1
  • next
Say what?
by May 20, 2005 6:26 PM PDT
This site is degenerating into meaningless babble.

But that's okay, because Put Down Pete loves environments where babble is the order of the day.

Relational database under attack? Yah? I can't wait for the author's next article, "The continuing benefits of COBOL programs in the 21st century."
Reply to this comment
Sometimes
by May 21, 2005 8:24 AM PDT
it is better to keep quiet & have people THINK you are an idiot, than to open your mouth & remove all doubt.

Over the last 25 years (started writing software in high school, about the time Bill Gates was setting up M$), there has been a continual battle to capture as much data as possible & then try to find a way to store it (and possibly to retrieve it too). This problem is nothing new.

In 1999, I designed an application for a major credit card company to store all of their transactions for North America, for the last 6 months. It faced similar problems. I created a partitioned DB2 table for the main data repository. Each days load overlaid the partition with the oldest data. It also updated a table to indicate what data was in each partition. On the rare occassions the data changed, a concurrent Reorg' would run (normally saved up for a weekend) and the partition was Reorg'ed. At each change, concurrent image copies were taken & that was all we needed to store & maintain the data.

That approach worked in that situation. But won't work here - just TOO MUCH data. But looking at the requirements, let's go back before RDMS's became so popular, to use a set of concatenated tape-based VSAM files ! Before technology became so enamoured with adding bells & whistles, VSAM was pretty cool as a basic indexed flat file.

BTW, with things such as OO COBOL (even being used in .NET) and Unix System Services (a product that allows a mainframe program to write a file onto a Unix server, in almost the same way it would write a mainframe file), COBOL is still very much alive & kicking. All those DB2 tables in the EIS Tier have to be maintained somehow & although I had fun writing PL/I & assembler, COBOL is simply better. With (OO) COBOL getting the job done, maiframe Java is not yet essential.
Your user name says it all
by May 23, 2005 3:46 PM PDT
You are rude and are yourself the source of any "degeneration" as you put it. Why don't you keep your comments to yourself unless you can be polite and civilized?
Great Article
by EuroMarkus May 20, 2005 11:33 PM PDT
It's beaten into every new programmer that every piece of data needs to be thrown into a powerful relational database, even though it might be overkill for their project.

There is a reason why SMTP servers use flat files, because they scale well, are efficient, their portable, and no corruption will occur. Ask a MS Exchange Admin who's on his 50th mailstore recovery if he likes the "all eggs in one basket" approach.

Data retention will be an ever growing problem, not just with space allotment, but with future data formats. How do you know in 20 years if the data you have now can even be read? Archiving records in plain-text is the only way to insure your data can be read 5, 10, even 50 years from now.
Reply to this comment
Hmm... more a DB Mgmt issue
by alx359 May 21, 2005 12:55 AM PDT
How write-once/maybe read data is represented: flat file/relational DB, seems irrelevant to me. The key issue is how static data is being managed out of the expensive resources. For example, one could unmount and archive such static data as monthly/quarterly/etc DB's in a batch job and start again with empty ones. Querying them later shall be backed by automation:
- data request (time-framed)
- get related DB file(s) from tape and cache them to disk
- mount DB file(s)
- exec query against mounted DB's
- unmount DB file(s)
- save updated DB's back to tape (if needed).

Of course, such automation may require some middle steps to be fast/reliable enough, but one gets the idea.
Reply to this comment
Gotta disagree
by May 21, 2005 9:05 AM PDT
Alex - when you see the overhead that comes with an RDBMS, it's pretty huge, and it's pointless, when the user doesn't need all the bells & whistles the RDBMS provives.

It's OK to ask the DBA/data modeler to optimize their designs. But if the data doesn't need to go in a DB, you may be talking to the wrong people.
View reply
Maybe you two can co-write the article on COBOL, lol
by May 21, 2005 1:40 PM PDT
Myopia.
Reply to this comment
Old does not mean bad
by amadensor May 23, 2005 7:37 AM PDT
Yes, COBOL is old. It is often not the right tool for the job. Its syntrax is verbose, and it works best with flat files rather than databases. There are, however, instances where this is exactly what you need. COBOL is fast at processing very large large numbers of records, and is also very good at handling fixed width files. This is an instance of using the right tool for the job. VSAM could be good for this sort of data as long as the newly inserted data will not fall in the middle of the key range. VSAM's index structures are not good for this. CA splits and CI splits have the potential of killing performance.

Just because something is old does not mean it is obsolete junk. Pete, I guess you have given up on the wheel and fire.
Perhaps....but consider the overheads associated with File Systems
by msc001 May 21, 2005 7:50 PM PDT
I think Kate's point about RDBMS is well taken - relational databases of today are grossly insufficient for the amounts of data that is being generated. I would however suggest that a better solution is a proprietary data-well / data-cache built on a proprietary OS whose only purpose is to store transactional data. No need for UI's, complex process managers, etc. - just a data cache, messaging API, and a transport.
Reply to this comment
sweetheart, you're showing your ignorance
by May 22, 2005 7:22 AM PDT
To think that the main purpose of an rdbms is to only do
transactions shows just how much you don't know. Major
rdbms engines do:
-marshalling resources for thousands of users
-security
-sql query parsing and optimization
-data abstraction and normalization
-recovery of data from system failure (hardware & software)
-relational mathematics to join desparate data structures
-advanced connectivity of a variety of user platforms
-advanced MPP processing & parallelization of queries
against partitioned data structures.

to name but a few.....

I doubt you came up with this idea on your own, but
whoever put you up to it is demostrating both of your
ignorance. Go back to doing marketing for Spencers.
Leave the complicated stuff to professionals...
Reply to this comment
Agree, with additional viewpoint
by May 22, 2005 10:12 AM PDT
It is true that the relational database is oversold. I won't address the large data volume requirement, but will identify another aspect of the mismatch of RDBMSs: Relational databases are an *extremely poor* match for object-oriented applications. The use of relational databases easily doubles if not triples the cost of OO projects, and cripples the performance that could be achieved if OO databases were used for those kinds of applications. Relational databases are pushed on organizations because DBA staff won't consider alternatives and infrastructure managers want to play it safe. As a result, all their internal software projects suffer. Relational databases are good for ad-hoc queries, which accountants like, but they are a nightmare for OO design.
Reply to this comment
in reference to your OO comment
by May 22, 2005 6:51 PM PDT
see www.hibernate.org

It is a high performance object/relational persistence
service for JAVA applications.

Most professionals are using it with great success, in
conjunction with Websphere or Weblogic.

Also, your complaint about performance doesn't wash,
consider Walmart uses an rdbms as their data warehouse,
which has about 50 TB or more.... This is where MPP
processing comes into play...

All of the major rdbms vendors have MPP / Grid scalability
as well as initiatives on RFID data. Use Google to query, for
instance, type in: Teradata RFID. or Oracle RFID, or Sybase
RFID... you'll see that these companies are investing very
heavily, as are all of the storage vendor (Network
Appliance, EMC, etc...).
View reply
Why, Cliff?
by David Arbogast May 23, 2005 7:42 AM PDT
If you were making the architecture decision, what alternative would you go with?

<<The use of relational databases easily doubles if not triples the cost of OO projects, and cripples the performance that could be achieved if OO databases were used for those kinds of applications. >>
View reply
Craziness...
by David Arbogast May 23, 2005 7:39 AM PDT
RDBS is unsuitable for "WriteOnce/ReadMaybe?"

Okay... first of all, there is no such thing as "ReadMaybe." If you "Might" have to read the data, then you need read capability. This is a WriteOnce/ReadMany scenario, where not all data may be read. Nothing unique here.

Overhead? The author wants to use FLAT FILES "instead" of a relational database? Silly. First of all, you can creat "flat tables" in a relational database. There is the really interesting concept some of us have heard of... its called, "Data Warehousing." And it address the vast majority of shortcomings in transaction-based databases for this type of storage. Ask a good DBA. They'll tell you that database designs can vary widely depending on the expected data usage. Apparently, the author thinks that RDBMS systems are only for transaction processing.

But lets look at the other example here.... recording thousands of 3G telephone calls? WHY? What company creates voice-recording of EVERY phone call right now? Is this REALLY a practical application? Even RFID applications only store a small code. I assume somebody probably records phone calls digitally, though... and I assume, they are using a DATABASE in the most robust of implementations. Their reason for taking this approach is likely the same reason they'll instantly reject the idea of flat-files for data storage: Raw, flat-file IO is more costly than database lookups in terms of system resources. Read: LESS PERFORMANCE.

Now, I'm not hung up on relational databases... with some of the object-oriented work going on, and more recently, aspect-oriented data storage, there are certainly alternative ways to contemplate storing data. But if you are going to suggest that NEW storage systems are needed, to meet NEW demands, the last thing you want to do is to suggest OLD technology that is far less suited for the job than a RDBMS.
Reply to this comment
what ever happened to isam?
by lizardlists May 23, 2005 7:58 AM PDT
For my shop, its true, 80% or more of RDBMS data never gets read. We generate massive amounts of data that only needs one or two keys to make it accessable. We read it, sum it up, look for exceptions and spit out a message or two.

Its not that RDBMS is bad, its just that its not a good fit for simple but big data, and that if you are stuck with Oracle its damn expensive for what you actually use.
Reply to this comment
This May be one of the Dumbest Articles Ever
by May 23, 2005 9:29 AM PDT
Who wrote this dreck? Whoever did doesn't have a clue what s/he is pontificating about.

If the CEO whose name is on the byline wrote it, recommedation to investors: get out quick.

If the CEO whose name is on the byline DIDN'T write the article, recommendation to CEO: FIRE the author. They don't know a thing.
Reply to this comment
She's almost on the money...
by nazzdeq May 23, 2005 10:31 AM PDT
Most of the replies are made by people who don't seem to work in places with serious volumes of data in the TB or PB range, therefore you fail to see the problem.

If you guys were in charge, you would have made Google run on Oracle, queries would come back in a week and they would have never gotten off the ground due to horrible performance and pathetic scalability. So much for your RDBMS vision.

They store Petabytes of data in flat files, not relational databases. The do this for a reason.

RDBMSs are tools that are good for certain situations, but not all of them. The type of data the author of the article mentioned is one of them where RDBMSs are not suitable.

Yeah, sure you can partition the database, offline the partitions and bring 'em back when you need them. This is way too slow and cumbersome just to look up a single record.

The bigger problem is that corporations are data packrats. In most situations all this data is useless and mining 5 year old data to find out a conclusion that may not be valid anymore is also idiotic.

Of course, in some industries there are legal reasons you have to retain the data and doing so in an RDBMS can be expensive and slow.
Reply to this comment
only reasonable comment here
by confucious_says May 23, 2005 9:10 PM PDT
thanks for droppin knowledge....it wasn't so long ago i used to do the same, but then i thought, why bother?
still not see any money at all
by alx359 May 23, 2005 9:38 PM PDT
Your reasoning sounds inconsistent and out of context.

Of course bringing offline/online partitions is time consuming, but after all the whole point of this story was the write once/read maybe scenario. Our proposed approach in the comment above should keep the physical layer homogeneous despite of TB/PB ranges and the existing level of partitioning/clustering. If you're going to use (archived) flat files for a search it's going to be slow either and more cumbersome to manage if not your native format.

The google story is interesting by itself but it doesn't relate to this story either. Data there is write/read a lot.

Retrospective Data Mining is simply an option and depends of type of business and data being collected. Sometimes patterns are a statistical event and it could take a considerable amount of time to collect that data. Found nothing "idiotic" of being able to make a research from old data. What's the point of collecting it if not usable when/if needed.

As said by others, OODBMS is still a very promising approach, but it consists of adding more structure to the DBMS, not removing it. So future sounds not very promising for a bunch of dull flat files.
Talk is cheap; Show us your FLAT FILE GENIUS
by May 23, 2005 2:03 PM PDT
Products talk.

We have a marketplace to sort out the nonsense of all this.

If Oracle is nonsense, show us the better way.
Reply to this comment
*sigh
by pcLoadLetter May 24, 2005 1:53 PM PDT
Some peopole just can't be made to learn how to comrehend what they read.

The point is that relational databases has been made out to be the best way, period. That is wrong. Sometimes it is the best solution and sometimes it is not. But yet, many think it is, always.

Google is a perfect example of why relational databases are not always the best way to go.

Flat file may be 'boring', but it is often as fast and efficient as anything else, many times it is faster by a lot. Same goes for a database that persistantly stores objects.

This whole discussion reminds me of the #1 rule of professional, efficient programming: "know your data".

If you don't understand the implications to that rule, then you will be condemned to use what is more popular, without knowing whether it is the best solution for you in any specific case.
Embedded? Object database technology!
by May 23, 2005 4:36 PM PDT
It is clear that relational database technology cannot serve all masters. More than 200,000 downloads speak proof that, in Java and .NET environments, the right embedded database technology is a native object database, available as free open source software from http://www.db4o.com.
Reply to this comment
ALL you NEED is MySQL
by May 26, 2005 5:55 AM PDT
Try mysql - fast, simple and low cost solution!
Reply to this comment
Data (Unstructred and Structured) - Way to Go
by May 29, 2005 11:05 PM PDT
Hi,

In both cases data volumes are very high.

Structured::
This type of data is already structured (sits in Databases), there is very minimal scope for rearranging and giving a unique context for the data.

Unstructured:
This type of data can deliver a set of data depending on the context.

The Solution:
Any form of data can be retrieved and arranged with more of context driven tools, which in turn maintatins only the context of each end-user.

The static data which never changes or which has a permanent scope can be made available in database and the remaining data can be there as "unstructured data". Which are driven by "context engines".

Follow the principle of 80-20, where 20% of static data moved into database and retain 80% of data in unstructured way... which serves all type of contexts..!!

- Thanks
Ramesh N T
Reply to this comment
Trite nonsense
by ERK August 15, 2005 9:10 AM PDT
Your article uses common (as in vulgar) industry buzzwords to attempt to justify unconditional acceptance of "trends." That's not what I'd call "rethinking". While it's always useful to attack orthodoxies, one needs some real weapons (in this case reason) to do so.

> However, the character of much of the data
> generated by businesses today does not match the
> strengths of the RDBMS in virtually any respect.

Completely false - the cases where XML and OODBMS might make some sense are the niches.

> This mismatch is revealed within the context of
> Information Lifecycle Management, or assessing
> the handling of data from the time of its
> creation to its obsolescence.

Giving something an acronym (ILM) does not legitimize it, nor make it a useful field of study.

> ILM is rapidly gaining favor within enterprise
> IT departments as an effective approach for
> coping with rapidly growing volumes of corporate
> data. The time is right to rethink how to deal
> with the looming explosion in data volumes.

So ILM has happened prior to this "rethinking" that we need. And yet you're using the admittedly ad hoc ILM to justify a "rethinking"?

> Relational databases--with their transactional,
> dynamic and multi-user features--come with
> functionality that far exceeds what's needed for
> simply storing and accessing write-once /
> read-maybe business data.

"Relational" has nothing to do with the fact that you're describing (I guess) limitations on current SQL DBMSs. DBMS != "database". Relational != SQL.

> This excess functionality requires sizable
> hardware and software investments that grow in
> proportion to the amount of data handled.

You're saying that because I have some extra subroutines and such, that this requires not only linearly-scaled hardware, but additional software? How do you justify this complete nonsequitur and mischaracterization of DBMS software?

Besides, if current DBMSs are doing things badly, they should improve. There are no fundamental theoretical issues here - merely bad implementations which have nothing to do with "the nature of data."

> The answer likely resides in pairing the RDBMS
> with a complementary technology that is
> particularly suited to the demands of capturing
> and storing large volumes of this write-once
> data. Ironically, a technology previously
> destined for the history books may well fit
> current and future requirements perfectly: the
> flat file.

Any DBMS worth its salt should be able to offload infrequently-access data to another disk or even a flat file, without any modification to the query language. A simple concept in computer systems: HIDE IT FROM THE USERS. Make whatever optimizations you must, but make the interface clean and seamless.

> Long relegated to application-embedded databases
> and desktop programs, a flat file that borrows a
> key feature from the relational database--the
> index--meets all of the requirements previously
> described for digital-business event data.

Except speed, query languages, logical organization, physical organization, separation of logical and physical, etc. etc. etc.

> At the same time, this approach also delivers on
> the promise of ILM by putting the right data in
> the right place for the right cost without
> sacrificing support for the business.

And why exactly can't the DBMS vendors update their software to hide this manifestation of physical storage from users? Sheer laziness, if anything. This is a burden for the DBMS vendors.

> The time is right to rethink how to deal with
> the looming explosion in data volumes. The
> relational database is an impressive technology,
> but it is also the most expensive way to store
> large volumes of static data simply to provide
> for potential access some time in the future.

Relational has nothing to do with this. You are confusing the implementation with the model; the SQL "model" with the relational model; and physical storage handlers with the DBMS itself. Give this some more thought, I beg you, for the sake of your readers.

> It is a frequently referenced fact that 80
> percent of the data stored in relational
> databases is never accessed once it is written
> to the database.

Nonsense, but if it were true... which 80%?

> Simply put, the relational database is too much
> hammer for the digital business-event nail.

A trite bit of nonsense. Better implementations, not "models" of I.T. like "ILM", are the rational solution.
Reply to this comment
(32 Comments)
  • prev
  • 1
  • next
advertisement

Latest tech news headlines

advertisement

RSS Feeds

Add headlines from CNET News to your homepage or feedreader.

More feeds available in our RSS feed index.

Markets

Market news, charts, SEC filings, and more

Related quotes

Dow Jones Industrials (-0.18%) -18.90 10,452.68
S&P 500 (0.03%) 0.38 1,109.24
NASDAQ (0.42%) 9.22 2,185.03
CNET TECH (-0.11%) -1.78 1,593.64
  Symbol Lookup
advertisement

Inside CNET News

Scroll Left Scroll Right