Re: [BUG] Archive recovery failure on 9.3+.

From: Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>
To: Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
Cc: katsumata(dot)tomonari(at)po(dot)ntts(dot)co(dot)jp, pgsql-hackers(at)postgresql(dot)org
Subject: Re: [BUG] Archive recovery failure on 9.3+.
Date: 2014-02-13 16:47:54
Message-ID: 52FCF73A.3040208@vmware.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 02/13/2014 02:42 PM, Heikki Linnakangas wrote:
> The behavior where we prefer a segment from archive with lower TLI over
> a file with higher TLI in pg_xlog actually changed in commit
> a068c391ab0. Arguably changing it wasn't a good idea, but the problem
> your test script demonstrates can be fixed by not archiving the partial
> segment, with no change to the preference of archive/pg_xlog. As
> discussed, archiving a partial segment seems like a bad idea anyway, so
> let's just stop doing that.

After some further thought, while not archiving the partial segment
fixes your test script, it's not enough to fix all variants of the
problem. Even if archive recovery doesn't archive the last, partial,
segment, if the original master server is still running, it's entirely
possible that it fills the segment and archives it. In that case,
archive recovery will again prefer the archived segment with lower TLI
over the segment with newer TLI in pg_xlog.

So I agree we should commit the patch you posted (or something to that
effect). The change to not archive the last segment still seems like a
good idea, but perhaps we should only do that in master.

Even if after that patch, you can have a problem in more complicated
scenarios involving both an archive and streaming replication. For
example, imagine a timeline history like this:

TLI

1 ----+--------------------------->
|
2 +--------------------------->

Now imagine that timeline 1 has been fully archived, and there are WAL
segments much higher than the points where the timeline switch occurred
present in the archive. But none of the WAL segments for timeline 2 have
been archived; they are only present in a master server. You want to
perform recovery to timeline 2, using the archived WAL segments for
timelines 1, and streaming replication to catch up to the tip of timeline 2.

Whether we prefer files from pg_xlog or archive will make no difference
in this case, as there are no files in pg_xlog. But it will merrily
apply all the WAL for timeline 1 from the archive that it can find, past
the timeline switch point. After that, when it tries to connect to the
server will streaming replication, it will fail.

There's not much we can do about that in 9.2 and below, but in 9.3 the
timeline history file contains the exact timeline switch points, so we
could be more careful and not apply any extra WAL on the old timeline
past the switch point. We could also be more exact in which files we try
to restore from the archive, instead of just polling every future TLI in
the history.

- Heikki

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Vik Fearing 2014-02-13 17:10:18 nextVictimBuffer in README
Previous Message Andres Freund 2014-02-13 16:12:38 Re: Changeset Extraction v7.6