From: | Heikki Linnakangas <hlinnakangas(at)vmware(dot)com> |
---|---|
To: | Magnus Hagander <magnus(at)hagander(dot)net> |
Cc: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: pg_basebackup from cascading standby after timeline switch |
Date: | 2012-12-21 12:54:02 |
Message-ID: | 50D45BEA.4070409@vmware.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On 17.12.2012 18:58, Magnus Hagander wrote:
> On Mon, Dec 17, 2012 at 5:19 PM, Tom Lane<tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> Heikki Linnakangas<hlinnakangas(at)vmware(dot)com> writes:
>>> I'm not happy with the fact that we just ignore the problem in a backup
>>> taken from a standby, silently giving the user a backup that won't start
>>> up. Why not include the timeline history file in the backup?
>>
>> +1. I was not aware that we weren't doing that --- it seems pretty
>> foolish, especially since as you say they're tiny.
>
> Yeah, +1. That should probably have been a part of the whole
> "basebackup from slave" patch, so it can probably be considered a
> back-patchable bugfix in itself, no?
Yes, this should be backpatched to 9.2. I came up with the attached.
However, thinking about this some more, there's a another bug in the way
WAL files are included in the backup, when a timeline switch happens.
basebackup.c includes all the WAL files on ThisTimeLineID, but when the
backup is taken from a standby, the standby might've followed a timeline
switch. So it's possible that some of the WAL files should come from
timeline 1, while others should come from timeline 2. This leads to an
error like "requested WAL segment 00000001000000000000000C has already
been removed" in pg_basebackup.
Attached is a script to reproduce that bug, if someone wants to play
with it. It's a bit sensitive to timing, and needs tweaking the paths at
the top.
One solution to that would be to pay more attention to the timelines to
include WAL from. basebackup.c could read the timeline history file, to
see exactly where the timeline switches happened, and then construct the
filename of each WAL segment using the correct timeline id. Another
approach would be to do readdir() on pg_xlog, and include all WAL files,
regardless of timeline IDs, that fall in the right XLogRecPtr range. The
latter seems easier to backpatch.
- Heikki
Attachment | Content-Type | Size |
---|---|---|
include-all-tli-files-in-backup-1.patch | text/x-diff | 3.4 KB |
recipe12.sh | application/x-sh | 4.5 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Peter Eisentraut | 2012-12-21 14:01:42 | Re: need a function to extract list items from pg_node_tree |
Previous Message | Andres Freund | 2012-12-21 12:26:32 | Re: need a function to extract list items from pg_node_tree |