Re: Incremental backups, and backup history

From: Dennis Gearon <gearond(at)cvc(dot)net>
To: Antonios Christofides <A(dot)Christofides(at)itia(dot)ntua(dot)gr>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: Incremental backups, and backup history
Date: 2003-06-19 15:02:51
Message-ID: 3EF1D09B.9050203@cvc.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On your second question:

Keeping old data helps with data analysis, i.e., data mining. I would do the fired date as transactions. To see if an employee is still and employee, look for the latest transation, hired, rehired, contracted with as a temp/consultant, fired, laid off, etc.

Antonios Christofides wrote:

> Hi,
>
> I have two backup questions, not much related to each other; here they
> are.
>
> First: With PostgreSQL, I can't do incremental backups. pg_dump will
> dump the entire database. Thus, if I want to keep daily backups on tape,
> I'm in trouble because I'll have to do a full backup every day, which
> may need several hours and several tapes.
>
> One workaround I'm thinking is to not store BLOBs in the database, but
> store them in the filesystem and store the filenames in the database
> instead. This needs some additional work (for example, a garbage
> collector to periodically delete unreferenced files), but will move a
> large amount of space from the database into the filesystem, which is
> capable of incremental backups. Only BLOBs, that is; for some tables
> that will have several tens of millions of small rows, I can't think of
> any workaround.
>
> Is this filenames-instead-of-BLOBs for easier backup common practice?
> Any other ideas or comments?
>
> My second question is a general relational database backup question, not
> specifically related to pgsql. Sometimes a user accidentally
> deletes/corrupts a file, and discovers it three days later. After they
> come panicing to me, I can give them their file as it was three days
> ago, because of the backup tape rotation. Now suppose a user deletes
> ten employees from the database, and three days later they understand
> that this was a bad thing. Now what? I can restore the entire database
> and make it as it was three days ago, but I can't restore the particular
> deleted records in the current database, as the relations make the
> database behave as a single unit.
>
> A colleague suggested, instead of updating or deleting rows, to only
> insert rows with a timestamp; for example, instead of updating the
> employee row with id=55, you insert a new row with id=55 with the
> updated data, and change the primary key to (id, dateinserted). You then
> always select the row with max dateinserted. A garbage collector is also
> needed to periodically delete obsolete rows older than, say, six months.
> Improvements can be made (such as using dateobsoleted instead of
> dateinserted or moving the old rows to another table), but even in the
> simplest cases I think it will be extremely hard to implement such a
> system, again because of the relations.
>
> So, it is a matter of database design? Do I have to design the database
> so that it keeps the history of what happened?
>
> Thanks everyone for the answers.
>
> ---------------------------(end of broadcast)---------------------------
> TIP 9: the planner will ignore your desire to choose an index scan if your
> joining column's datatypes do not match
>

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Dennis Gearon 2003-06-19 15:04:56 Re: A creepy story about dates. How to prevent it?
Previous Message Jonathan Bartlett 2003-06-19 14:54:11 Re: A creepy story about dates. How to prevent it?