Re: 040_pg_createsubscriber.pl is slow and unstable (was Re: speed up a logical replica setup)

From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Peter Eisentraut <peter(at)eisentraut(dot)org>, Noah Misch <noah(at)leadboat(dot)com>, Alexander Lakhin <exclusion(at)gmail(dot)com>, Euler Taveira <euler(at)eulerto(dot)com>, Shlok Kyal <shlok(dot)kyal(dot)oss(at)gmail(dot)com>, Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>, "kuroda(dot)hayato(at)fujitsu(dot)com" <kuroda(dot)hayato(at)fujitsu(dot)com>, Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>, "pgsql-hackers(at)lists(dot)postgresql(dot)org" <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Michael Paquier <michael(at)paquier(dot)xyz>, Andres Freund <andres(at)anarazel(dot)de>, Ashutosh Bapat <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com>, Fabrízio de Royes Mello <fabriziomello(at)gmail(dot)com>, vignesh C <vignesh21(at)gmail(dot)com>
Subject: Re: 040_pg_createsubscriber.pl is slow and unstable (was Re: speed up a logical replica setup)
Date: 2024-07-30 03:54:52
Message-ID: CAA4eK1JDhdNom8VYUV4YziQEU0_hmoXEC2bpSkP+_u8cwi01nA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Jul 30, 2024 at 1:48 AM Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>
> Robert Haas <robertmhaas(at)gmail(dot)com> writes:
> > On Sun, Jun 30, 2024 at 2:40 PM Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> >> ... However, I added a new open item about how the
> >> 040_pg_createsubscriber.pl test is slow and still unstable.
>
> > But that said, I see no commits in the commit history which purport to
> > improve performance, so I guess the performance is probably still not
> > what you want, though I am not clear on the details.
>
> My concern is described at [1]:
>
> >> I have a different but possibly-related complaint: why is
> >> 040_pg_createsubscriber.pl so miserably slow? On my machine it
> >> runs for a bit over 19 seconds, which seems completely out of line
> >> (for comparison, 010_pg_basebackup.pl takes 6 seconds, and the
> >> other test scripts in this directory take much less). It looks
> >> like most of the blame falls on this step:
> >>
> >> [12:47:22.292](14.534s) ok 28 - run pg_createsubscriber on node S
> >>
> >> AFAICS the amount of data being replicated is completely trivial,
> >> so that it doesn't make any sense for this to take so long --- and
> >> if it does, that suggests that this tool will be impossibly slow
> >> for production use. But I suspect there is a logic flaw causing
> >> this. Speculating wildly, perhaps that is related to the failure
> >> Alexander spotted?
>
> The followup discussion in that thread made it sound like there's
> some fairly fundamental deficiency in how wait_for_end_recovery()
> detects end-of-recovery. I'm not too conversant with the details
> though, and it's possible that pg_createsubscriber is just falling
> foul of a pre-existing infelicity.
>
> If the problem can be correctly described as "pg_createsubscriber
> takes 10 seconds or so to detect end-of-stream",
>

The problem can be defined as: "pg_createsubscriber waits for an
additional (new) WAL record to be generated on primary before it
considers the standby is ready for becoming a subscriber". Now, on
busy systems, this shouldn't be a problem but for idle systems, the
time to detect end-of-stream can't be easily defined.

One of the proposed solutions is that pg_createsubscriber generate a
dummy WAL record on the publisher/primary by using something like
pg_logical_emit_message(), pg_log_standby_snapshot(), etc. This will
fix the problem (BF failures and slow detection for end-of-stream) but
sounds more like a hack. The other ideas that we can consider as
mentioned in [1] require API/design change which is not preferable at
this point. So, the only way seems to be to accept the generation of
dummy WAL records to bring predictability in the tests or otherwise in
the usage of the tool.

[1] - https://www.postgresql.org/message-id/CAA4eK1%2Bp%2B7Ag6nqdFRdqowK1EmJ6bG-MtZQ_54dnFBi%3D_OO5RQ%40mail.gmail.com

--
With Regards,
Amit Kapila.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Hayato Kuroda (Fujitsu) 2024-07-30 03:56:15 RE: speed up a logical replica setup
Previous Message Zhang Mingli 2024-07-30 03:50:54 COPY FROM crash