Incremental backups, and backup history

From: Antonios Christofides <A(dot)Christofides(at)itia(dot)ntua(dot)gr>
To: pgsql-general(at)postgresql(dot)org
Subject: Incremental backups, and backup history
Date: 2003-06-19 08:42:28
Message-ID: 20030619084228.GA20482@itia.ntua.gr
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Hi,

I have two backup questions, not much related to each other; here they
are.

First: With PostgreSQL, I can't do incremental backups. pg_dump will
dump the entire database. Thus, if I want to keep daily backups on tape,
I'm in trouble because I'll have to do a full backup every day, which
may need several hours and several tapes.

One workaround I'm thinking is to not store BLOBs in the database, but
store them in the filesystem and store the filenames in the database
instead. This needs some additional work (for example, a garbage
collector to periodically delete unreferenced files), but will move a
large amount of space from the database into the filesystem, which is
capable of incremental backups. Only BLOBs, that is; for some tables
that will have several tens of millions of small rows, I can't think of
any workaround.

Is this filenames-instead-of-BLOBs for easier backup common practice?
Any other ideas or comments?

My second question is a general relational database backup question, not
specifically related to pgsql. Sometimes a user accidentally
deletes/corrupts a file, and discovers it three days later. After they
come panicing to me, I can give them their file as it was three days
ago, because of the backup tape rotation. Now suppose a user deletes
ten employees from the database, and three days later they understand
that this was a bad thing. Now what? I can restore the entire database
and make it as it was three days ago, but I can't restore the particular
deleted records in the current database, as the relations make the
database behave as a single unit.

A colleague suggested, instead of updating or deleting rows, to only
insert rows with a timestamp; for example, instead of updating the
employee row with id=55, you insert a new row with id=55 with the
updated data, and change the primary key to (id, dateinserted). You then
always select the row with max dateinserted. A garbage collector is also
needed to periodically delete obsolete rows older than, say, six months.
Improvements can be made (such as using dateobsoleted instead of
dateinserted or moving the old rows to another table), but even in the
simplest cases I think it will be extremely hard to implement such a
system, again because of the relations.

So, it is a matter of database design? Do I have to design the database
so that it keeps the history of what happened?

Thanks everyone for the answers.

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Shridhar Daithankar 2003-06-19 08:50:46 Re: Incremental backups, and backup history
Previous Message Ivar 2003-06-19 08:37:34 Re: How to insert unicode strings ?