Re: 040_pg_createsubscriber.pl is slow and unstable (was Re: speed up a logical replica setup)

From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: Ashutosh Bapat <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Robert Haas <robertmhaas(at)gmail(dot)com>, Peter Eisentraut <peter(at)eisentraut(dot)org>, Noah Misch <noah(at)leadboat(dot)com>, Alexander Lakhin <exclusion(at)gmail(dot)com>, Euler Taveira <euler(at)eulerto(dot)com>, Shlok Kyal <shlok(dot)kyal(dot)oss(at)gmail(dot)com>, Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>, "kuroda(dot)hayato(at)fujitsu(dot)com" <kuroda(dot)hayato(at)fujitsu(dot)com>, Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>, "pgsql-hackers(at)lists(dot)postgresql(dot)org" <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Michael Paquier <michael(at)paquier(dot)xyz>, Andres Freund <andres(at)anarazel(dot)de>, Fabrízio de Royes Mello <fabriziomello(at)gmail(dot)com>, vignesh C <vignesh21(at)gmail(dot)com>
Subject: Re: 040_pg_createsubscriber.pl is slow and unstable (was Re: speed up a logical replica setup)
Date: 2024-07-30 06:22:15
Message-ID: CAA4eK1Luc4HcRUx0bwgEWyj8YQ+RCyu5QLj_LZAHq9U2OfGf5w@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Jul 30, 2024 at 11:28 AM Ashutosh Bapat
<ashutosh(dot)bapat(dot)oss(at)gmail(dot)com> wrote:
>
> On Tue, Jul 30, 2024 at 9:25 AM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> >
> > On Tue, Jul 30, 2024 at 1:48 AM Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> > >
> > > Robert Haas <robertmhaas(at)gmail(dot)com> writes:
> > > > On Sun, Jun 30, 2024 at 2:40 PM Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> > > >> ... However, I added a new open item about how the
> > > >> 040_pg_createsubscriber.pl test is slow and still unstable.
> > >
> > > > But that said, I see no commits in the commit history which purport to
> > > > improve performance, so I guess the performance is probably still not
> > > > what you want, though I am not clear on the details.
> > >
> > > My concern is described at [1]:
> > >
> > > >> I have a different but possibly-related complaint: why is
> > > >> 040_pg_createsubscriber.pl so miserably slow? On my machine it
> > > >> runs for a bit over 19 seconds, which seems completely out of line
> > > >> (for comparison, 010_pg_basebackup.pl takes 6 seconds, and the
> > > >> other test scripts in this directory take much less). It looks
> > > >> like most of the blame falls on this step:
> > > >>
> > > >> [12:47:22.292](14.534s) ok 28 - run pg_createsubscriber on node S
> > > >>
> > > >> AFAICS the amount of data being replicated is completely trivial,
> > > >> so that it doesn't make any sense for this to take so long --- and
> > > >> if it does, that suggests that this tool will be impossibly slow
> > > >> for production use. But I suspect there is a logic flaw causing
> > > >> this. Speculating wildly, perhaps that is related to the failure
> > > >> Alexander spotted?
> > >
> > > The followup discussion in that thread made it sound like there's
> > > some fairly fundamental deficiency in how wait_for_end_recovery()
> > > detects end-of-recovery. I'm not too conversant with the details
> > > though, and it's possible that pg_createsubscriber is just falling
> > > foul of a pre-existing infelicity.
> > >
> > > If the problem can be correctly described as "pg_createsubscriber
> > > takes 10 seconds or so to detect end-of-stream",
> > >
> >
> > The problem can be defined as: "pg_createsubscriber waits for an
> > additional (new) WAL record to be generated on primary before it
> > considers the standby is ready for becoming a subscriber". Now, on
> > busy systems, this shouldn't be a problem but for idle systems, the
> > time to detect end-of-stream can't be easily defined.
>
> AFAIU, the server will emit running transactions WAL record at least
> 15 seconds.
>

AFAICU, this is not true because the code suggests that the running
xacts record is inserted by bgwriter only when enough time has passed
and interesting records have been inserted since the last snapshot.
Please take a look at the following comment and code in bgwriter.c
"Only log if enough time has passed and interesting records have been
inserted since the last snapshot...".

--
With Regards,
Amit Kapila.

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2024-07-30 06:25:31 Re: Flush pgstats file during checkpoints
Previous Message Ashutosh Bapat 2024-07-30 05:58:41 Re: 040_pg_createsubscriber.pl is slow and unstable (was Re: speed up a logical replica setup)