Re: Recovery target 'immediate'

From: Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>
To: Simon Riggs <simon(at)2ndQuadrant(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Recovery target 'immediate'
Date: 2013-04-26 10:29:05
Message-ID: 517A56F1.8020200@vmware.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 26.04.2013 12:16, Simon Riggs wrote:
> On 18 April 2013 19:11, Heikki Linnakangas<hlinnakangas(at)vmware(dot)com> wrote:
>
>> I just found out that if you use continuous archiving and online backups,
>> it's surprisingly difficult to restore a backup, without replaying any more
>> WAL than necessary.
>
> I didn't add it myself because I don't see the need, if we think more carefully.
>
> Why would you want your recovery end time to be governed solely by the
> time that the *backup* ended? How can that have any bearing on what
> you want at recovery time? If you have access to more WAL data, why
> would you not apply them as well - unless you have some specific
> reason not to - i.e. an incorrect xid or known problem time?
>
> If you're storing only a few of the WAL files with the backup then it
> will end naturally without assistance when the last file runs out.
> What is the difference between stopping at an exact point in WAL half
> way through a file and ending at the end of the file? If the end point
> is arbitrary, why the need to specify it so closely?
>
> I can't see a time when I have access to more WAL files *and* I want
> to stop early at some imprecise point. But you could write a
> restore_command script that stopped after a specific file forcing
> recovery to end.

Well, I ran into this with VMware's Data Director, which manages backups
among other things. In a typical setup, you have a WAL archive, and
every now and then (daily, typically) a full backup is taken. Full
backups are retained for some time, like a few weeks or months. The user
can also manually request a full backup to be taken at any time.

There is an option to perform PITR. The system figures out the latest
full backup that precedes the chosen point-in-time, sets
recovery_target_time, and starts up Postgres. But there is also an
operation to simply "restore a backup". The idea of that is to, well,
restore to the chosen backup, and nothing more. In most cases, it
probably wouldn't hurt if a one or two extra WAL files are replayed
beyond the backup end time, but you certainly don't want to replay all
the history. Yes, you could set recovery_target_time to the point where
the backup ended, but that's complicated. You'd have to read the
end-of-backup timestamp from the backup history file. And because
timestamps are always a bit fuzzy, I think you'd have to add at least a
few seconds to that to be sure.

To illustrate why it would be bad to replay more WAL than necessary,
imagine that the user is about to perform some dangerous action he might
want to undo later. For example, he's about to purge old data that isn't
needed anymore, so with "DELETE FROM data WHERE year <= '2010'". The
first thing he does is to take a backup with label
"before-purging-2010". Immediately after the backup has finished, he
performs the deletion. Now, the application stops working because it
actually still needs the data, so he restores from the backup. If
recovery decides to replay a few more WAL files after the end-of-backup,
that could include the deletion, and that's no good.

One solution is to create restore point after the backup ends. Then you
have a clearly defined point in time you can restore to. But it would be
convenient to not have to do that. Or another way to think of this is
that it would be convenient if there was an implicit restore point at
the end of each backup.

- Heikki

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Florian Pflug 2013-04-26 11:11:00 Re: Substituting Checksum Algorithm (was: Enabling Checksums)
Previous Message Simon Riggs 2013-04-26 09:16:38 Re: Recovery target 'immediate'