Re: a provocative question?

From: Chris Browne <cbbrowne(at)acm(dot)org>
To: pgsql-general(at)postgresql(dot)org
Subject: Re: a provocative question?
Date: 2007-09-06 19:40:27
Message-ID: 607in3twes.fsf@dba2.int.libertyrms.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

tjo(at)acm(dot)org ("TJ O'Donnell") writes:
> I am getting in the habit of storing much of my day-to-day
> information in postgres, rather than "flat" files.
> I have not had any problems of data corruption or loss,
> but others have warned me against abandoning files.
> I like the benefits of enforced data types, powerful searching,
> data integrity, etc.
> But I worry a bit about the "safety" of my data, residing
> in a big scary database, instead of a simple friendly
> folder-based files system.
>
> I ran across this quote on Wikipedia at
> http://en.wikipedia.org/wiki/Eudora_%28e-mail_client%29
>
> "Text files are also much safer than databases, in that should disk
> corruption occur, most of the mail is likely to be unaffected, and any
> that is damaged can usually be recovered."
>
> How naive (optimistic?) is it to think that "the database" can
> replace "the filesystem"?

There is certainly some legitimacy to the claim; the demerits of
things like the Windows Registry as compared to "plain text
configuration" have been pretty clear.

If the "monstrous fragile binary data structure" gets stomped on, by
any means, then you can lose data in pretty massive and invisible
ways. It's most pointedly true if the data representation conflates
data and indexes in some attempt to "simplify" things by having Just
One File. In such a case, if *any* block gets corrupted, that has the
potential to irretrievably destroy the database.

However, the argument may also be taken too far.

-> A PostgreSQL database does NOT assemble data into "one monstrous
fragile binary data structure."

Each table consists of data files that are separate from index
files. Blowing up an index file *doesn't* blow up the data.

-> You are taking regular backups, right???

If you are, that's a considerable mitigation of risks. I don't
believe it's typical to set up off-site backups of one's Windows
Registry, in contrast...

-> In the case of PostgreSQL, mail stored in tuples is likely to get
TOASTed, which changes the shape of things further; the files get
smaller (due to compression), which changes the "target profile"
for this data.

-> In the contrary direction, storing the data as a set of files, each
of which requires storing metadata in binary filesystem data
structures provides an (invisible-to-the-user) interface to
what is, no more or less, than a "monstrous fragile binary data
structure."

That is, after all, what a filesystem is, if you strip out the
visible APIs that turn it into open()/close()/mkdir() calls.

If the wrong directory block gets "crunched," then /etc could get
munched just like the Windows Registry could.

Much of the work going into filesystem efforts, the last dozen years,
is *exceeding* similar to the work going into managing storage in
DBMSes. People working in both areas borrow from each other.

The natural result is that they live in fairly transparent homes in
relation to one another. Someone who "casts stones" of the sort in
your quote is making the fallacious assumption that since the fact
that a filesystem is a database of file information is kept fairly
much invisible, that a filesystem is somehow fundamentally less
vulnerable to the same kinds of corruptions.

Reality is that they are vulnerable in similar ways.

The one thing I could point to, in Eudora, as a *further* visible
merit that DOES retain validity is that there is not terribly much
metadata entrusted to the filesystem. Much the same is true for the
Rand MH "Mail Handler", where each message is a file with very little
filesystem-based metadata.

If you should have a filesystem failure, and discover you have a
zillion no-longer-named in lost+found, and decline to recover from a
backup, it should nonetheless be possible to re-process them through
any mail filters, and rebuild a mail filesystem that will appear
roughly similar to what it was like before.

That actually implies that there is *more* "conservatism of format"
than first meets the eye; in effect, the data is left in raw form,
replete with redundancies that can, in order to retain the ability to
perform this recovery process, *never* be taken out.

There is, in effect, more than meets the eye here...
--
(format nil "~S(at)~S" "cbbrowne" "acm.org")
http://linuxfinances.info/info/advocacy.html
"Lumping configuration data, security data, kernel tuning parameters,
etc. into one monstrous fragile binary data structure is really dumb."
- David F. Skoll

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Chris Browne 2007-09-06 19:43:23 Re: Do AGGREGATES consistently use sort order?
Previous Message volunteer 2007-09-06 19:32:44 an other provokative question??