Greg Stark <gsstark(at)mit(dot)edu> wrote:
> What would be useful is a tool which given a list of standby
> databases and list of base backup images can apply a set of policy
> rules to determine which base backups and archived logs to delete.
>
> The policy might look something like "keep one base backup per
> week going back a month and one per day going back seven days and
> keep archived logs going back far enough for any of these base
> backups or any of these live replicas."
>
> Bonus points if you can say "also keep one base backup per month
> going back three years with just enough archived logs to recover
> the base backup to a consistent state".
Hmmm... Our policy is "Keep the most recent base backup and all WAL
files from the point needed to use it to current on the backup
server local to the source database, keep the most recent two weekly
backups and all the WAL files from the point needed to start the
earlier to current on the central backup server (along with a warm
standby instance running for each source to confirm that the backup
and WAL files are usable), and keep the first weekly backup of each
month and just enough WAL files for a consistent start of each on a
mirrored SAN archive for one year."
Just in case you're looking for real-life policies currently in use.
By the way, the .backup files and the information from
pg_controldata and pg_ctl status currently provide just enough
information for this all to be run from bash scripts without much
human attention, but the effort required to get there is far from
trivial. I'm sure that something which made such a policy easy to
implement would be useful to some shops. Hopefully that can be done
without breaking current scripts.
-Kevin