From: | Heikki Linnakangas <hlinnakangas(at)vmware(dot)com> |
---|---|
To: | Andres Freund <andres(at)2ndquadrant(dot)com> |
Cc: | Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, PostgreSQL mailing lists <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Re: Slave enters in recovery and promotes when WAL stream with master is cut + delay master/slave |
Date: | 2013-01-17 21:49:22 |
Message-ID: | 50F871E2.7090504@vmware.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On 17.01.2013 21:57, Heikki Linnakangas wrote:
> On 17.01.2013 20:08, Andres Freund wrote:
>> On 2013-01-18 03:05:47 +0900, Fujii Masao wrote:
>>> I encountered the problem that the timeline switch is not performed
>>> expectedly.
>>> I set up one master, one standby and one cascade standby. All the
>>> servers
>>> share the archive directory. restore_command is specified in the
>>> recovery.conf
>>> in those two standbys.
>>>
>>> I shut down the master, and then promoted the standby. In this case, the
>>> cascade standby should switch to new timeline and replication should be
>>> successfully restarted. But the timeline was never changed, and the
>>> following
>>> log messages were kept outputting.
>>>
>>> sby2 LOG: restarted WAL streaming at 0/3000000 on timeline 1
>>> sby2 LOG: replication terminated by primary server
>>> sby2 DETAIL: End of WAL reached on timeline 1
>>> sby2 LOG: restarted WAL streaming at 0/3000000 on timeline 1
>>> sby2 LOG: replication terminated by primary server
>>> sby2 DETAIL: End of WAL reached on timeline 1
>>> sby2 LOG: restarted WAL streaming at 0/3000000 on timeline 1
>>> sby2 LOG: replication terminated by primary server
>>> sby2 DETAIL: End of WAL reached on timeline 1
>>> ....
>>
>> That's after the commit or before? Because in passing I think I
>> noticed/fixed a bug that could cause exactly that problem...
>
> I think I broke that with the "teach pg_receivexlog to cross timelines"
> patch. Will take a look...
Ok, there was a couple of issues. First, as I suspected above, I added a
new result set after copy has ended in START_STREAMING command, but
forgot to teach walreceiver about it. Fixed that.
Secondly, there's an interesting interaction between the new xlogreader
code and streaming replication and timeline switches:
The first call to the page_read callback in xlogreader is always made
with size SizeOfXLogRecord, counting from the beginning of the page.
That's bogus; there can be no WAL record in the very beginning of page,
because of the page header, so I think that was supposed to be
SizeXLogShortPHD. But that won't fix the issue.
The problem is that XLogPageRead callback uses the page address and
requested length to decide what timeline to stream from. For example,
imagine that there's a timeline switch at offset 1000 in a page, and we
have already read all the WAL up to that point. When xlogreader first
asks to fetch the first 32 bytes from the page (the bogus
SizeOfXLogRecord), XLogPageRead deduces that that byte range is still on
the old timeline, and starts streaming from the old timeline. Next,
xlogreader needs the rest of the page, up to 1000 + SizeOfPageHeader, to
read the first record it's actually interested in, XLogPageRead will
return an error because that range is not on the timeline that's
currently streamed. And we loop back to retry, and run into the same
problem again.
This interaction is a bit too subtle for my taste, but the
straightforward fix is to just modify xlogreader so that the first
read_page call requests all the bytes up to record we're actually
interested in. That seems like a smart thing to do anyway.
I committed a patch for that second issue too, but please take a look in
case you come up with a better idea.
- Heikki
From | Date | Subject | |
---|---|---|---|
Next Message | Dimitri Fontaine | 2013-01-17 21:51:03 | Re: Event Triggers: adding information |
Previous Message | Andres Freund | 2013-01-17 21:46:21 | HS locking broken in HEAD |