Re: speed up a logical replica setup

From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Alexander Lakhin <exclusion(at)gmail(dot)com>, Peter Eisentraut <peter(at)eisentraut(dot)org>, Euler Taveira <euler(at)eulerto(dot)com>, Shlok Kyal <shlok(dot)kyal(dot)oss(at)gmail(dot)com>, Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>, Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>, "pgsql-hackers(at)lists(dot)postgresql(dot)org" <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Michael Paquier <michael(at)paquier(dot)xyz>, Andres Freund <andres(at)anarazel(dot)de>, Ashutosh Bapat <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com>, Fabrízio de Royes Mello <fabriziomello(at)gmail(dot)com>, vignesh C <vignesh21(at)gmail(dot)com>
Subject: Re: speed up a logical replica setup
Date: 2024-07-02 12:24:07
Message-ID: CAA4eK1+p+7Ag6nqdFRdqowK1EmJ6bG-MtZQ_54dnFBi=_OO5RQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Jul 1, 2024 at 8:22 PM Hayato Kuroda (Fujitsu)
<kuroda(dot)hayato(at)fujitsu(dot)com> wrote:
>
> > I have a different but possibly-related complaint: why is
> > 040_pg_createsubscriber.pl so miserably slow? On my machine it
> > runs for a bit over 19 seconds, which seems completely out of line
> > (for comparison, 010_pg_basebackup.pl takes 6 seconds, and the
> > other test scripts in this directory take much less). It looks
> > like most of the blame falls on this step:
> >
> > [12:47:22.292](14.534s) ok 28 - run pg_createsubscriber on node S
> >
> > AFAICS the amount of data being replicated is completely trivial,
> > so that it doesn't make any sense for this to take so long --- and
> > if it does, that suggests that this tool will be impossibly slow
> > for production use. But I suspect there is a logic flaw causing
> > this.
>
> I analyzed the issue. My elog() debugging said that wait_for_end_recovery() was
> wasted some time. This was caused by the recovery target seeming unsatisfactory.
>
> We are setting recovery_target_lsn by the return value of pg_create_logical_replication_slot(),
> which returns the end of the RUNNING_XACT record. If we use the returned value as
> recovery_target_lsn as-is, however, we must wait for additional WAL generation
> because the parameter requires that the replicated WAL overtake a certain point.
> On my env, the function waited until the bgwriter emitted the XLOG_RUNNING_XACTS record.
>

IIUC, the problem is that the consistent_lsn value returned by
setup_publisher() is the "end +1" location of the required LSN whereas
the recovery_target_lsn used in wait_for_end_recovery() expects the
LSN value to be "start" location of required LSN.

> One simple solution is to add an additional WAL record at the end of the publisher
> setup. IIUC, an arbitrary WAL insertion can reduce the waiting time. The attached
> patch inserts a small XLOG_LOGICAL_MESSAGE record, which could reduce much execution
> time on my environment.
>

This sounds like an ugly hack to me and don't know if we can use it.
The ideal way to fix this is to get the start_lsn from the
create_logical_slot functionality or have some parameter like
recover_target_end_lsn but I don't know if this is a good time to
extend such a functionality.

--
With Regards,
Amit Kapila.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Daniel Gustafsson 2024-07-02 12:27:27 Re: CREATE OR REPLACE MATERIALIZED VIEW
Previous Message Matthias van de Meent 2024-07-02 12:20:00 Re: Use generation memory context for tuplestore.c