From: | Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp> |
---|---|
To: | andres(at)anarazel(dot)de |
Cc: | michael(dot)paquier(at)gmail(dot)com, nag1010(at)gmail(dot)com, jdnelson(at)dyn(dot)com, pgsql-hackers(at)postgresql(dot)org |
Subject: | Re: [BUGS] Bug in Physical Replication Slots (at least 9.5)? |
Date: | 2017-09-07 03:33:47 |
Message-ID: | 20170907.123347.101584520.horiguchi.kyotaro@lab.ntt.co.jp |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-bugs pgsql-hackers |
Hello,
At Wed, 6 Sep 2017 12:23:53 -0700, Andres Freund <andres(at)anarazel(dot)de> wrote in <20170906192353(dot)ufp2dq7wm5fd6qa7(at)alap3(dot)anarazel(dot)de>
> On 2017-09-06 17:36:02 +0900, Kyotaro HORIGUCHI wrote:
> > The problem is that the current ReadRecord needs the first one of
> > a series of continuation records from the same source with the
> > other part, the master in the case.
>
> What's the problem with that? We can easily keep track of the beginning
> of a record, and only confirm the address before that.
After failure while reading a record locally, ReadRecored tries
streaming to read from the beginning of a record, which is not on
the master, then retry locally and.. This loops forever.
> > A (or the) solution closed in the standby side is allowing to
> > read a seris of continuation records from muliple sources.
>
> I'm not following. All we need to use is the beginning of the relevant
> records, that's easy enough to keep track of. We don't need to read the
> WAL or anything.
The beginning is already tracked and nothing more to do.
I reconsider that way and found that it doesn't need such
destructive refactoring.
The first *problem* was WaitForWALToBecomeAvaialble requests the
beginning of a record, which is not on the page the function has
been told to fetch. Still tliRecPtr is required to determine the
TLI to request, it should request RecPtr to be streamed.
The rest to do is let XLogPageRead retry other sources
immediately. To do this I made ValidXLogPageHeader(at)xlogreader(dot)c
public (and renamed to XLogReaderValidatePageHeader).
The patch attached fixes the problem and passes recovery
tests. However, the test for this problem is not added. It needs
to go to the last page in a segment then put a record continues
to the next segment, then kill the standby after receiving the
previous segment but before receiving the whole record.
regards,
--
Kyotaro Horiguchi
NTT Open Source Software Center
Attachment | Content-Type | Size |
---|---|---|
0001-Allow-switch-WAL-source-midst-of-record.patch | text/x-patch | 3.9 KB |
0002-Debug-assistant-code.patch | text/x-patch | 1.4 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Jeff Frost | 2017-09-07 04:45:45 | Re: Old row version in hot chain become visible after a freeze |
Previous Message | Jeff Frost | 2017-09-07 03:23:28 | Re: Old row version in hot chain become visible after a freeze |
From | Date | Subject | |
---|---|---|---|
Next Message | Tom Lane | 2017-09-07 03:59:16 | Re: Setting pd_lower in GIN metapage |
Previous Message | Chapman Flack | 2017-09-07 02:55:26 | Re: Replication vs. float timestamps is a disaster |