Re: Why we really need timelines *now* in PITR

From: Simon Riggs <simon(at)2ndquadrant(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>, pgsql-hackers(at)postgreSQL(dot)org
Subject: Re: Why we really need timelines *now* in PITR
Date: 2004-07-22 00:21:17
Message-ID: 1090455677.2658.1405.camel@localhost.localdomain
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, 2004-07-21 at 23:42, Tom Lane wrote:
> Simon Riggs <simon(at)2ndquadrant(dot)com> writes:
> > More verbosely (not numbered because they're not a sequence or
> > progression)
>
> > - if no recovery.conf is present we do crash recovery to end of logs on
> > pg_control timeline (i.e. current)
>
> Check.
>
> > - if recovery.conf is present and we do not specify a target we do
> > archive recovery to end of logs on pg_control timeline (i.e. current)
>
> I have done it this way for now, but I'm unconvinced whether this is the
> best default --- it might be that we'd be better off making 'latest' be
> the default. The point here is that when you restore a tar backup,
> 'current' is going to become the timeline that was current when the
> backup was made, not the one that was current just before you wiped
> $PGDATA. I'm not really sure which case is going to be more commonly
> wanted.

Right now, that sounds the best option. But my head hurts :)

>
> > - if recovery.conf is present and we specify a target, but no timeline,
> > then we do archive recovery on the pg_control timeline only, and assume
> > that the target was supposed to be on this, even if we don't find it
>
> Whether you specify a target stopping point does not matter AFAICS. The
> timeline selection has to be made before we can even look at the data.
>

Yes, I was describing a case where a default behaviour would be required
to make the timeline selection before the "desired" behaviour could be
enacted.

> > - if recovery.conf is present and we specify a timeline of literally
> > 'latest' (without having to know what that is) - then we search archive
> > for the latest history file, then we do archive recovery from the
> > pg_control timeline to the latest timeline as specified in the latest
> > history file. If we specify a target, then this is searched for on
> > whatever timeline we're on as we rollforward.
>
> Check.
>
> > - if recovery.conf is present and we specify a timeline - then we search
> > archive for that history file, then we do archive recovery from the
> > pg_control timeline to the specified timeline as shown in that history
> > file. If we specify a target, then this is searched for on whatever
> > timeline we're on as we rollforward.
>
> Check.
>
> >>> I don't like the name target_in_timeline,
> >>
> >> Agreed, but I don't have a better name offhand for it.
>
> For lack of any better idea, I have swallowed my objections to "target"
> and called it "recovery_target_timeline". We can easily rename the
> parameter if anyone comes up with something more compelling.
>
> Above behavior is all committed to CVS as of a few minutes ago.
>

...very cool.

OK, back to first principles as a cross-check then:

PITR should cope with these scenarios. These are described reasonably
closely but not as exact mechanical tests, so as to ensure that if
multiple solutions exist to these recovery scenarios that all paths are
tested.

These are written with a view to *rough* functionality of timelines,
rather than reading the above and making up cases to fit. I suggest we
see if these all work, see why not (if not) and make up some other cases
to make sure all possibilities are catered for.

1. We crash, and wish to recover, as per 7.4

2. We are running happily, using an automated standby database. The
first database fails irrecoverably and we are forced to switch to the
second system which recovers quickly to end of logs, though without the
partially full current xlog from the downed system.

3. We are running happily, but spot a rogue transaction that we wish to
expunge. We decide to run a PITR up to that txnid. We do an archive
recovery to a recovery_target_xid. We have available to us local copies
of the xlogs if required.

4. We perform (3), then after operating for an hour, we realise that
this was an extremely bad idea and decide to recover back to the point
BEFORE we started to recover the first time - i.e. to try to pretend we
had never attempted PITR in the first place because there was some even
more important data just recently committed we didn't know about.

5. We attempt (4) but fail because the then-current log, which has not
been archived, was deleted because we wouldn't need it anymore. We
decide that we made the right choice in the first place and decide to
re-run the PITR, though to a point slightly ahead of where we stopped
last time we tried that.

6. We are running a distributed system that does not properly support
two-phase commit in all of its persistent components. One of the other
components fails (of course not pg!) and we are forced to do a PITR to a
point in time that matches the best last known timestamp of all
persistent system components. We PITR to a recovery_target_time.

7. We have just done (6), but 10 minutes into production we realise that
the clocks between 2 of our systems were out by 3 seconds. Not much, but
it is causing serious errors to bang around the system. We decide to
re-run the previous PITR, but this time to a point 3 seconds further
along the same chain of xlogs. We don't specify timeline, cos thats
really complex stuff and we don't understand it.

8. We perform (3,4,5), then after operating for three hours the rogue
transaction happens again. We realise that the rogue transaction is in
fact a deliberate security violation. We immediately close network
access and try to recover. Management decides we must accept the first
rogue transaction's effects, but the second is too large to be
acceptable. We need to recover to a recovery_target_xid prior to the
second attack. The first recovery meant that xids were being reused (on
a different timeline) and so the xid we wish to recover MAY exist on
both the first and second timeline. To ensure we don't recover the wrong
transactions, we decide to specify we wish to recover to a
recovery_target_xid on recovery_target_timeline = 2.

9. A mistake was made setting setting a system clock - the month and day
were transposed (7th May -> 5 July), so setting the system apparently
into the future. To reset the clock, we have to perform a full database
recovery into the newly reset system which is now apparently in the
past. We rollforward to end of logs using local and archive copies. We
want crash recovery to still work, even though we have apparently gone
backwards in time according to the log timestamps.

10. We perform (6), then realise that the database server is hosted in
another timezone and we accidentally recovered to a different point in
time, out by a few hours. We want to re-run the recovery, correctly
specifying the point in time.

11. We are in same position as (7), but specify a timeline and also a
time that in fact does not exist on that timeline.

That gives us enough to talk through and begin some testing.

Anybody have any other horror stories, bring 'em on.

> Another thing I note is that archive_status .ready messages are
written
> > for all restored xlog files (rather than .done messages).
>
> I think this is gone now. However, we still have the issue of preventing
> re-archival of old, incomplete XLOG segments that might be brought back
> into pg_xlog/ as a result of restoring a tar backup. I don't have a
> better solution to that than the one Bruce and I proposed yesterday
> (make the DBA clean out pg_xlog before starting a recovery run).

I'll give that some thought.

Best Regards, Simon Riggs

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Rod Taylor 2004-07-22 00:39:17 Re: check point segments leakage ?
Previous Message Tom Lane 2004-07-21 23:53:33 Re: PreallocXlogFiles