Re: [HACKERS] [BUGS] Bug in Physical Replication Slots (at least 9.5)?

From: Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
To: hlinnaka(at)iki(dot)fi
Cc: andres(at)anarazel(dot)de, michael(dot)paquier(at)gmail(dot)com, sfrost(at)snowman(dot)net, nag1010(at)gmail(dot)com, jdnelson(at)dyn(dot)com, pgsql-hackers(at)postgresql(dot)org
Subject: Re: [HACKERS] [BUGS] Bug in Physical Replication Slots (at least 9.5)?
Date: 2018-04-24 10:57:12
Message-ID: 20180424.195712.56235724.horiguchi.kyotaro@lab.ntt.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs pgsql-hackers

Thank you very much for looking this!

At Mon, 23 Apr 2018 03:41:47 -0400, Heikki Linnakangas <hlinnaka(at)iki(dot)fi> wrote in <89e33d4f-5c75-0738-3dcb-44c4df59e920(at)iki(dot)fi>
> On 18/01/18 20:54, Kyotaro HORIGUCHI wrote:
> > At Thu, 18 Jan 2018 11:52:52 -0800, Andres Freund <andres(at)anarazel(dot)de>
> > wrote in <20180118195252(dot)hyxgkb3krcqjzfjm(at)alap3(dot)anarazel(dot)de>
> >> On 2018-01-18 20:58:27 +0900, Michael Paquier wrote:
> >>> b) The second patch that I would like to mention is doing things on
> >>> the
> >>> standby side by being able to change a WAL source when fetching a
> >>> single
> >>> record so as it is possible to fetch the beginning of a cross-segment
> >>> record from say an archive or the local xlog, and then request the
> >>> rest
> >>> on the primary. This patch can be found in
> >>> https://www.postgresql.org/message-id/20171026.190551.208996945.horiguchi.kyotaro%40lab.ntt.co.jp
> >>> and the diff in WaitForWALToBecomeAvailable() scares me a lot because,
> >>> it seems to me that this actually breaks timeline switch
> >>> consistency. The concept of switching a WAL source in the middle of a
> >>> WAL segment is risky anyway, because we perform sanity checks while
> >>> switching sources. The bug we are dealing about here is very rare, and
> >>> seeing a problem like that for a timeline switch is even more rare,
> >>> but
> >>> the risk is not zero and we may finish with a corrupted instance, so
> >>> we
> >>> should not in my opinion take this approach. Also this actually
> >>> achieves
> >>> point 2) above, not completely though, but not 1).
> >>
> >> I don't agree with the sentiment about the approach, but I agree there
> >> might be weaknesses in the implementation (which I have not
> >> reviewed). I
> >> think it's perfectly sensible to require fetching one segment from
> >> pg_xlog and the next one via another method. Currently the inability
> >> to
> >> do so often leads to us constantly refetching the same segment.
> > Though I'm still not fully confident, if reading a set of
> > continuation records from two (or more) sources can lead to
> > inconsistency, two successve (complete) records are facing the
> > same risk.
>
> This seems like the best approach to me as well.
>
> Looking at the patch linked above
> (https://www.postgresql.org/message-id/20171026.190551.208996945.horiguchi.kyotaro%40lab.ntt.co.jp)
>
> > --- a/src/backend/access/transam/xlog.c
> > +++ b/src/backend/access/transam/xlog.c
> > @@ -11693,6 +11693,10 @@ retry:
> > Assert(reqLen <= readLen);
> > *readTLI = curFileTLI;
> > +
> > + if (!XLogReaderValidatePageHeader(xlogreader, targetPagePtr,
> > readBuf))
> > + goto next_record_is_invalid;
> > +
> > return readLen;
> > next_record_is_invalid:
>
> What is the point of adding this XLogReaderValidatePageHeader() call?
> The caller of this callback function, ReadPageInternal(), checks the
> page header anyway. Earlier in this thread, you said:

Without the lines, server actually fails to start replication.

(I try to remember the details...)

The difference is whether the function can cause retry for the
same portion of a set of continued records (without changing the
plugin API). XLogPageRead can do that. On the other hand all
ReadPageInternal can do is just letting the caller ReadRecords
retry from the beginning of the set of continued records since
the caller side handles only complete records.

In the failure case, in XLogPageRead, when read() fails, it can
try the next source midst of a continued records. On the other
hand if the segment was read but it was recycled one, it passes
"success" to ReadPageInternal and leads to retring from the
beginning of the recrod. Infinite loop.

Calling XLogReaderValidatePageHeader in ReadPageInternal is
redundant, but removing it may be interface change of xlogreader
plugin and I am not sure that is allowed.

> > The rest to do is let XLogPageRead retry other sources
> > immediately. To do this I made ValidXLogPageHeader(at)xlogreader(dot)c
> > public (and renamed to XLogReaderValidatePageHeader).
>
> but I don't understand that.

It is exposed so that XLogPageRead can validate the page header
using it.

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message PG Bug reporting form 2018-04-24 12:29:47 BUG #15169: create index CONCURRENTLY conflict with other table's COPY
Previous Message David G. Johnston 2018-04-24 04:29:11 Re: BUG #15168: "pg_isready -d" effectively ignores given database name

Browse pgsql-hackers by date

  From Date Subject
Next Message Alexander Korotkov 2018-04-24 11:05:20 Re: [HACKERS] Custom compression methods
Previous Message Jehan-Guillaume de Rorthais 2018-04-24 10:50:23 Re: Make description of heap records more talkative for flags