ie8 fix
Ad: Read more on Cloud Computing

May 20, 2005 12:48 PM PDT

Perspective: Rethinking the relational database

See all Perspectives

(continued from previous page)

Biography
Kate Mitchell is CEO of CopperEye. She previously served as senior vice president for marketing and development at SeeBeyond Technology.

More Perspectives

Previous page
Page 1 | 2

32 comments

Join the conversation!
Add your comment (Log in or register)
Say what?
This site is degenerating into meaningless babble.

But that's okay, because Put Down Pete loves environments where babble is the order of the day.

Relational database under attack? Yah? I can't wait for the author's next article, "The continuing benefits of COBOL programs in the 21st century."
Posted by (88 comments )
Reply Link Flag
Sometimes
it is better to keep quiet & have people THINK you are an idiot, than to open your mouth & remove all doubt.

Over the last 25 years (started writing software in high school, about the time Bill Gates was setting up M$), there has been a continual battle to capture as much data as possible & then try to find a way to store it (and possibly to retrieve it too). This problem is nothing new.

In 1999, I designed an application for a major credit card company to store all of their transactions for North America, for the last 6 months. It faced similar problems. I created a partitioned DB2 table for the main data repository. Each days load overlaid the partition with the oldest data. It also updated a table to indicate what data was in each partition. On the rare occassions the data changed, a concurrent Reorg' would run (normally saved up for a weekend) and the partition was Reorg'ed. At each change, concurrent image copies were taken & that was all we needed to store & maintain the data.

That approach worked in that situation. But won't work here - just TOO MUCH data. But looking at the requirements, let's go back before RDMS's became so popular, to use a set of concatenated tape-based VSAM files ! Before technology became so enamoured with adding bells & whistles, VSAM was pretty cool as a basic indexed flat file.

BTW, with things such as OO COBOL (even being used in .NET) and Unix System Services (a product that allows a mainframe program to write a file onto a Unix server, in almost the same way it would write a mainframe file), COBOL is still very much alive & kicking. All those DB2 tables in the EIS Tier have to be maintained somehow & although I had fun writing PL/I & assembler, COBOL is simply better. With (OO) COBOL getting the job done, maiframe Java is not yet essential.
Posted by (409 comments )
Link Flag
Your user name says it all
You are rude and are yourself the source of any "degeneration" as you put it. Why don't you keep your comments to yourself unless you can be polite and civilized?
Posted by (7 comments )
Link Flag
Great Article
It's beaten into every new programmer that every piece of data needs to be thrown into a powerful relational database, even though it might be overkill for their project.

There is a reason why SMTP servers use flat files, because they scale well, are efficient, their portable, and no corruption will occur. Ask a MS Exchange Admin who's on his 50th mailstore recovery if he likes the "all eggs in one basket" approach.

Data retention will be an ever growing problem, not just with space allotment, but with future data formats. How do you know in 20 years if the data you have now can even be read? Archiving records in plain-text is the only way to insure your data can be read 5, 10, even 50 years from now.
Posted by EuroMarkus (10 comments )
Reply Link Flag
Hmm... more a DB Mgmt issue
How write-once/maybe read data is represented: flat file/relational DB, seems irrelevant to me. The key issue is how static data is being managed out of the expensive resources. For example, one could unmount and archive such static data as monthly/quarterly/etc DB's in a batch job and start again with empty ones. Querying them later shall be backed by automation:
- data request (time-framed)
- get related DB file(s) from tape and cache them to disk
- mount DB file(s)
- exec query against mounted DB's
- unmount DB file(s)
- save updated DB's back to tape (if needed).

Of course, such automation may require some middle steps to be fast/reliable enough, but one gets the idea.
Posted by alx359 (40 comments )
Reply Link Flag
Gotta disagree
Alex - when you see the overhead that comes with an RDBMS, it's pretty huge, and it's pointless, when the user doesn't need all the bells & whistles the RDBMS provives.

It's OK to ask the DBA/data modeler to optimize their designs. But if the data doesn't need to go in a DB, you may be talking to the wrong people.
Posted by (409 comments )
Link Flag
Maybe you two can co-write the article on COBOL, lol
Myopia.
Posted by (88 comments )
Reply Link Flag
Old does not mean bad
Yes, COBOL is old. It is often not the right tool for the job. Its syntrax is verbose, and it works best with flat files rather than databases. There are, however, instances where this is exactly what you need. COBOL is fast at processing very large large numbers of records, and is also very good at handling fixed width files. This is an instance of using the right tool for the job. VSAM could be good for this sort of data as long as the newly inserted data will not fall in the middle of the key range. VSAM's index structures are not good for this. CA splits and CI splits have the potential of killing performance.

Just because something is old does not mean it is obsolete junk. Pete, I guess you have given up on the wheel and fire.
Posted by amadensor (234 comments )
Link Flag
Perhaps....but consider the overheads associated with File Systems
I think Kate's point about RDBMS is well taken - relational databases of today are grossly insufficient for the amounts of data that is being generated. I would however suggest that a better solution is a proprietary data-well / data-cache built on a proprietary OS whose only purpose is to store transactional data. No need for UI's, complex process managers, etc. - just a data cache, messaging API, and a transport.
Posted by msc001 (2 comments )
Reply Link Flag
sweetheart, you're showing your ignorance
To think that the main purpose of an rdbms is to only do
transactions shows just how much you don't know. Major
rdbms engines do:
-marshalling resources for thousands of users
-security
-sql query parsing and optimization
-data abstraction and normalization
-recovery of data from system failure (hardware & software)
-relational mathematics to join desparate data structures
-advanced connectivity of a variety of user platforms
-advanced MPP processing & parallelization of queries
against partitioned data structures.

to name but a few.....

I doubt you came up with this idea on your own, but
whoever put you up to it is demostrating both of your
ignorance. Go back to doing marketing for Spencers.
Leave the complicated stuff to professionals...
Posted by (6 comments )
Reply Link Flag
Agree, with additional viewpoint
It is true that the relational database is oversold. I won't address the large data volume requirement, but will identify another aspect of the mismatch of RDBMSs: Relational databases are an *extremely poor* match for object-oriented applications. The use of relational databases easily doubles if not triples the cost of OO projects, and cripples the performance that could be achieved if OO databases were used for those kinds of applications. Relational databases are pushed on organizations because DBA staff won't consider alternatives and infrastructure managers want to play it safe. As a result, all their internal software projects suffer. Relational databases are good for ad-hoc queries, which accountants like, but they are a nightmare for OO design.
Posted by (7 comments )
Reply Link Flag
in reference to your OO comment
see www.hibernate.org

It is a high performance object/relational persistence
service for JAVA applications.

Most professionals are using it with great success, in
conjunction with Websphere or Weblogic.

Also, your complaint about performance doesn't wash,
consider Walmart uses an rdbms as their data warehouse,
which has about 50 TB or more.... This is where MPP
processing comes into play...

All of the major rdbms vendors have MPP / Grid scalability
as well as initiatives on RFID data. Use Google to query, for
instance, type in: Teradata RFID. or Oracle RFID, or Sybase
RFID... you'll see that these companies are investing very
heavily, as are all of the storage vendor (Network
Appliance, EMC, etc...).
Posted by (6 comments )
Link Flag
Why, Cliff?
If you were making the architecture decision, what alternative would you go with?

<<The use of relational databases easily doubles if not triples the cost of OO projects, and cripples the performance that could be achieved if OO databases were used for those kinds of applications. >>
Posted by David Arbogast (1712 comments )
Link Flag
Craziness...
RDBS is unsuitable for "WriteOnce/ReadMaybe?"

Okay... first of all, there is no such thing as "ReadMaybe." If you "Might" have to read the data, then you need read capability. This is a WriteOnce/ReadMany scenario, where not all data may be read. Nothing unique here.

Overhead? The author wants to use FLAT FILES "instead" of a relational database? Silly. First of all, you can creat "flat tables" in a relational database. There is the really interesting concept some of us have heard of... its called, "Data Warehousing." And it address the vast majority of shortcomings in transaction-based databases for this type of storage. Ask a good DBA. They'll tell you that database designs can vary widely depending on the expected data usage. Apparently, the author thinks that RDBMS systems are only for transaction processing.

But lets look at the other example here.... recording thousands of 3G telephone calls? WHY? What company creates voice-recording of EVERY phone call right now? Is this REALLY a practical application? Even RFID applications only store a small code. I assume somebody probably records phone calls digitally, though... and I assume, they are using a DATABASE in the most robust of implementations. Their reason for taking this approach is likely the same reason they'll instantly reject the idea of flat-files for data storage: Raw, flat-file IO is more costly than database lookups in terms of system resources. Read: LESS PERFORMANCE.

Now, I'm not hung up on relational databases... with some of the object-oriented work going on, and more recently, aspect-oriented data storage, there are certainly alternative ways to contemplate storing data. But if you are going to suggest that NEW storage systems are needed, to meet NEW demands, the last thing you want to do is to suggest OLD technology that is far less suited for the job than a RDBMS.
Posted by David Arbogast (1712 comments )
Reply Link Flag
what ever happened to isam?
For my shop, its true, 80% or more of RDBMS data never gets read. We generate massive amounts of data that only needs one or two keys to make it accessable. We read it, sum it up, look for exceptions and spit out a message or two.

Its not that RDBMS is bad, its just that its not a good fit for simple but big data, and that if you are stuck with Oracle its damn expensive for what you actually use.
Posted by lizardlists (1 comment )
Reply Link Flag
This May be one of the Dumbest Articles Ever
Who wrote this dreck? Whoever did doesn't have a clue what s/he is pontificating about.

If the CEO whose name is on the byline wrote it, recommedation to investors: get out quick.

If the CEO whose name is on the byline DIDN'T write the article, recommendation to CEO: FIRE the author. They don't know a thing.
Posted by (274 comments )
Reply Link Flag
She's almost on the money...
Most of the replies are made by people who don't seem to work in places with serious volumes of data in the TB or PB range, therefore you fail to see the problem.

If you guys were in charge, you would have made Google run on Oracle, queries would come back in a week and they would have never gotten off the ground due to horrible performance and pathetic scalability. So much for your RDBMS vision.

They store Petabytes of data in flat files, not relational databases. The do this for a reason.

RDBMSs are tools that are good for certain situations, but not all of them. The type of data the author of the article mentioned is one of them where RDBMSs are not suitable.

Yeah, sure you can partition the database, offline the partitions and bring 'em back when you need them. This is way too slow and cumbersome just to look up a single record.

The bigger problem is that corporations are data packrats. In most situations all this data is useless and mining 5 year old data to find out a conclusion that may not be valid anymore is also idiotic.

Of course, in some industries there are legal reasons you have to retain the data and doing so in an RDBMS can be expensive and slow.
Posted by nazzdeq (64 comments )
Reply Link Flag
only reasonable comment here
thanks for droppin knowledge....it wasn't so long ago i used to do the same, but then i thought, why bother?
Posted by confucious_says (7 comments )
Link Flag
still not see any money at all
Your reasoning sounds inconsistent and out of context.

Of course bringing offline/online partitions is time consuming, but after all the whole point of this story was the write once/read maybe scenario. Our proposed approach in the comment above should keep the physical layer homogeneous despite of TB/PB ranges and the existing level of partitioning/clustering. If you're going to use (archived) flat files for a search it's going to be slow either and more cumbersome to manage if not your native format.

The google story is interesting by itself but it doesn't relate to this story either. Data there is write/read a lot.

Retrospective Data Mining is simply an option and depends of type of business and data being collected. Sometimes patterns are a statistical event and it could take a considerable amount of time to collect that data. Found nothing "idiotic" of being able to make a research from old data. What's the point of collecting it if not usable when/if needed.

As said by others, OODBMS is still a very promising approach, but it consists of adding more structure to the DBMS, not removing it. So future sounds not very promising for a bunch of dull flat files.
Posted by alx359 (40 comments )
Link Flag
Talk is cheap; Show us your FLAT FILE GENIUS
Products talk.

We have a marketplace to sort out the nonsense of all this.

If Oracle is nonsense, show us the better way.
Posted by (88 comments )
Reply Link Flag
*sigh
Some peopole just can't be made to learn how to comrehend what they read.

The point is that relational databases has been made out to be the best way, period. That is wrong. Sometimes it is the best solution and sometimes it is not. But yet, many think it is, always.

Google is a perfect example of why relational databases are not always the best way to go.

Flat file may be 'boring', but it is often as fast and efficient as anything else, many times it is faster by a lot. Same goes for a database that persistantly stores objects.

This whole discussion reminds me of the #1 rule of professional, efficient programming: "know your data".

If you don't understand the implications to that rule, then you will be condemned to use what is more popular, without knowing whether it is the best solution for you in any specific case.
Posted by pcLoadLetter (395 comments )
Link Flag
Embedded? Object database technology!
It is clear that relational database technology cannot serve all masters. More than 200,000 downloads speak proof that, in Java and .NET environments, the right embedded database technology is a native object database, available as free open source software from <a class="jive-link-external" href="http://www.db4o.com" target="_newWindow">http://www.db4o.com</a>.
Posted by (2 comments )
Reply Link Flag
ALL you NEED is MySQL
Try mysql - fast, simple and low cost solution!
Posted by (1 comment )
Reply Link Flag
Data (Unstructred and Structured) - Way to Go
Hi,

In both cases data volumes are very high.

Structured::
This type of data is already structured (sits in Databases), there is very minimal scope for rearranging and giving a unique context for the data.

Unstructured:
This type of data can deliver a set of data depending on the context.

The Solution:
Any form of data can be retrieved and arranged with more of context driven tools, which in turn maintatins only the context of each end-user.

The static data which never changes or which has a permanent scope can be made available in database and the remaining data can be there as "unstructured data". Which are driven by "context engines".

Follow the principle of 80-20, where 20% of static data moved into database and retain 80% of data in unstructured way... which serves all type of contexts..!!

- Thanks
Ramesh N T
Posted by (1 comment )
Reply Link Flag
Trite nonsense
Your article uses common (as in vulgar) industry buzzwords to attempt to justify unconditional acceptance of "trends." That's not what I'd call "rethinking". While it's always useful to attack orthodoxies, one needs some real weapons (in this case reason) to do so.

&gt; However, the character of much of the data
&gt; generated by businesses today does not match the
&gt; strengths of the RDBMS in virtually any respect.

Completely false - the cases where XML and OODBMS might make some sense are the niches.

&gt; This mismatch is revealed within the context of
&gt; Information Lifecycle Management, or assessing
&gt; the handling of data from the time of its
&gt; creation to its obsolescence.

Giving something an acronym (ILM) does not legitimize it, nor make it a useful field of study.

&gt; ILM is rapidly gaining favor within enterprise
&gt; IT departments as an effective approach for
&gt; coping with rapidly growing volumes of corporate
&gt; data. The time is right to rethink how to deal
&gt; with the looming explosion in data volumes.

So ILM has happened prior to this "rethinking" that we need. And yet you're using the admittedly ad hoc ILM to justify a "rethinking"?

&gt; Relational databases--with their transactional,
&gt; dynamic and multi-user features--come with
&gt; functionality that far exceeds what's needed for
&gt; simply storing and accessing write-once /
&gt; read-maybe business data.

"Relational" has nothing to do with the fact that you're describing (I guess) limitations on current SQL DBMSs. DBMS != "database". Relational != SQL.

&gt; This excess functionality requires sizable
&gt; hardware and software investments that grow in
&gt; proportion to the amount of data handled.

You're saying that because I have some extra subroutines and such, that this requires not only linearly-scaled hardware, but additional software? How do you justify this complete nonsequitur and mischaracterization of DBMS software?

Besides, if current DBMSs are doing things badly, they should improve. There are no fundamental theoretical issues here - merely bad implementations which have nothing to do with "the nature of data."

&gt; The answer likely resides in pairing the RDBMS
&gt; with a complementary technology that is
&gt; particularly suited to the demands of capturing
&gt; and storing large volumes of this write-once
&gt; data. Ironically, a technology previously
&gt; destined for the history books may well fit
&gt; current and future requirements perfectly: the
&gt; flat file.

Any DBMS worth its salt should be able to offload infrequently-access data to another disk or even a flat file, without any modification to the query language. A simple concept in computer systems: HIDE IT FROM THE USERS. Make whatever optimizations you must, but make the interface clean and seamless.

&gt; Long relegated to application-embedded databases
&gt; and desktop programs, a flat file that borrows a
&gt; key feature from the relational database--the
&gt; index--meets all of the requirements previously
&gt; described for digital-business event data.

Except speed, query languages, logical organization, physical organization, separation of logical and physical, etc. etc. etc.

&gt; At the same time, this approach also delivers on
&gt; the promise of ILM by putting the right data in
&gt; the right place for the right cost without
&gt; sacrificing support for the business.

And why exactly can't the DBMS vendors update their software to hide this manifestation of physical storage from users? Sheer laziness, if anything. This is a burden for the DBMS vendors.

&gt; The time is right to rethink how to deal with
&gt; the looming explosion in data volumes. The
&gt; relational database is an impressive technology,
&gt; but it is also the most expensive way to store
&gt; large volumes of static data simply to provide
&gt; for potential access some time in the future.

Relational has nothing to do with this. You are confusing the implementation with the model; the SQL "model" with the relational model; and physical storage handlers with the DBMS itself. Give this some more thought, I beg you, for the sake of your readers.

&gt; It is a frequently referenced fact that 80
&gt; percent of the data stored in relational
&gt; databases is never accessed once it is written
&gt; to the database.

Nonsense, but if it were true... which 80%?

&gt; Simply put, the relational database is too much
&gt; hammer for the digital business-event nail.

A trite bit of nonsense. Better implementations, not "models" of I.T. like "ILM", are the rational solution.
Posted by ERK (1 comment )
Reply Link Flag
 

Join the conversation

Add your comment

The posting of advertisements, profanity, or personal attacks is prohibited. Click here to review our Terms of Use.

ie8 fix

RSS Feeds

Add headlines from CNET News to your homepage or feedreader.

Markets

Market news, charts, SEC filings, and more

Related quotes

Dow Jones Industrials (-0.60%) -74.92 12,454.83
S&P 500 (-0.22%) -2.86 1,317.82
NASDAQ (-0.07%) -1.85 2,837.53
CNET TECH (-0.20%) -4.05 2,040.30
  Symbol Lookup
ie8 fix
  • Recently Viewed Products
  • My Lists
  • My Software Updates
  • Promo
  • Log In | Join CNET