Re: A failure of standby to follow timeline switch

From: Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
To: Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>
Cc: Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, masao(dot)fujii(at)oss(dot)nttdata(dot)com, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: A failure of standby to follow timeline switch
Date: 2021-01-13 03:08:30
Message-ID: CAHGQGwGtAqCd0SaAiFSS78imhxuLzSzu0oQ7oFbHjqOD_1zxEA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Jan 13, 2021 at 10:48 AM Kyotaro Horiguchi
<horikyota(dot)ntt(at)gmail(dot)com> wrote:
>
> At Tue, 12 Jan 2021 10:47:21 +0900, Fujii Masao <masao(dot)fujii(at)gmail(dot)com> wrote in
> > On Sat, Jan 9, 2021 at 5:08 AM Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org> wrote:
> > >
> > > Masao-san: Are you intending to act as committer for these? Since the
> > > bug is mine I can look into it, but since you already did all the
> > > reviewing work, I'm good with you giving it the final push.
> >
> > Thanks! I'm thinking to push the patch.
> >
> >
> > > 0001 looks good to me; let's get that one committed quickly so that we
> > > can focus on the interesting stuff. While the implementation of
> > > find_in_log is quite dumb (not this patch's fault), it seems sufficient
> > > to deal with small log files. We can improve the implementation later,
> > > if needed, but we have to get the API right on the first try.
> > >
> > > 0003: The fix looks good to me. I verified that the test fails without
> > > the fix, and it passes with the fix.
> >
> > Yes.
> >
> >
> > > The test added in 0002 is a bit optimistic regarding timing, as well as
> > > potentially slow; it loops 1000 times and sleeps 100 milliseconds each
> > > time. In a very slow server (valgrind or clobber_cache animals) this
> > > could not be sufficient time, while on fast servers it may end up
> > > waiting longer than needed. Maybe we can do something like this:
> >
> > On second thought, I think that the regression test should be in
> > 004_timeline_switch.pl instead of 001_stream_rep.pl because it's
>
> Agreed. It's definitely the right place.
>
> > the test about timeline switch. Also I'm thinking that it's better to
> > test the timeline switch by checking whether some data is successfully
> > replicatead like the existing regression test for timeline switch in
> > 004_timeline_switch.pl does, instead of finding the specific message
> > in the log file. I attached the POC patch. Thought?
>
> It's practically a check on this issue, and looks better. The 180s
> timeout in the failure case seems a bit annoying but it's the way all
> of this kind of test follow.

Yes.

>
> The last check on table content is actually useless but it might make
> sense to confirm that replication is actually working. However, I
> don't think the test don't need to insert as many as 1000 tuples. Just
> a single tuple would suffice.

Thanks for the review!
I'm ok with this change (i.e., insert only single row).
Attached is the updated version of the patch.

Regards,

--
Fujii Masao

Attachment Content-Type Size
v6_follow_timeline_switch.patch application/octet-stream 2.6 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Craig Ringer 2021-01-13 03:41:58 Re: Logical decoding without slots: decoding in lockstep with recovery
Previous Message Hou, Zhijie 2021-01-13 02:40:54 remove unneeded pstrdup in fetch_table_list