From: | "Takamichi Osumi (Fujitsu)" <osumi(dot)takamichi(at)fujitsu(dot)com> |
---|---|
To: | 'Amit Kapila' <amit(dot)kapila16(at)gmail(dot)com>, Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com> |
Cc: | "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>, "vignesh21(at)gmail(dot)com" <vignesh21(at)gmail(dot)com>, "euler(at)eulerto(dot)com" <euler(at)eulerto(dot)com>, "m(dot)melihmutlu(at)gmail(dot)com" <m(dot)melihmutlu(at)gmail(dot)com>, "andres(at)anarazel(dot)de" <andres(at)anarazel(dot)de>, "marcos(at)f10(dot)com(dot)br" <marcos(at)f10(dot)com(dot)br>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>, "smithpb2250(at)gmail(dot)com" <smithpb2250(at)gmail(dot)com> |
Subject: | RE: Time delayed LR (WAS Re: logical replication restrictions) |
Date: | 2022-12-22 06:01:49 |
Message-ID: | TYCPR01MB83730A3E21E921335F6EFA38EDE89@TYCPR01MB8373.jpnprd01.prod.outlook.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Hi,
On Thursday, December 15, 2022 12:53 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> On Thu, Dec 15, 2022 at 7:16 AM Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>
> wrote:
> >
> > At Wed, 14 Dec 2022 10:46:17 +0000, "Hayato Kuroda (Fujitsu)"
> > <kuroda(dot)hayato(at)fujitsu(dot)com> wrote in
> > > I have implemented and tested that workers wake up per
> > > wal_receiver_timeout/2 and send keepalive. Basically it works well, but I
> found two problems.
> > > Do you have any good suggestions about them?
> > >
> > > 1)
> > >
> > > With this PoC at present, workers calculate sending intervals based
> > > on its wal_receiver_timeout, and it is suppressed when the parameter is set
> to zero.
> > >
> > > This means that there is a possibility that walsender is timeout
> > > when wal_sender_timeout in publisher and wal_receiver_timeout in
> subscriber is different.
> > > Supposing that wal_sender_timeout is 2min, wal_receiver_tiemout is
> > > 5min,
> >
> > It seems to me wal_receiver_status_interval is better for this use.
> > It's enough for us to docuemnt that "wal_r_s_interval should be
> > shorter than wal_sener_timeout/2 especially when logical replication
> > connection is using min_apply_delay. Otherwise you will suffer
> > repeated termination of walsender".
> >
>
> This sounds reasonable to me.
Okay, I changed the time interval to wal_receiver_status_interval
and added this description about timeout.
FYI, wal_receiver_status_interval by definition specifies
the minimum frequency for the WAL receiver process to send information
to the upstream. So I utilized the value for WaitLatch directly.
My descriptions of the documentation change follow it.
> > > and min_apply_delay is 10min. The worker on subscriber will wake up
> > > per 2.5min and send keepalives, but walsender exits before the message
> arrives to publisher.
> > >
> > > One idea to avoid that is to send the min_apply_delay subscriber
> > > option to publisher and compare them, but it may be not sufficient.
> > > Because XXX_timout GUC parameters could be modified later.
> >
> > # Anyway, I don't think such asymmetric setup is preferable.
> >
> > > 2)
> > >
> > > The issue reported by Vignesh-san[1] has still remained. I have
> > > already analyzed that [2], the root cause is that flushed WAL is not
> > > updated and sent to the publisher. Even if workers send keepalive
> > > messages to pub during the delay, the flushed position cannot be modified.
> >
> > I didn't look closer but the cause I guess is walsender doesn't die
> > until all WAL has been sent, while logical delay chokes replication
> > stream.
For the (2) issue, a new thread has been created independently from this thread in [1].
I'll leave any new changes to the thread on this point.
Attached the updated patch.
Again, I used one basic patch in another thread to wake up logical replication worker
shared in [2] for TAP tests.
[1] - https://www.postgresql.org/message-id/TYAPR01MB586668E50FC2447AD7F92491F5E89@TYAPR01MB5866.jpnprd01.prod.outlook.com
[2] - https://www.postgresql.org/message-id/flat/20221122004119.GA132961%40nathanxps13
Best Regards,
Takamichi Osumi
Attachment | Content-Type | Size |
---|---|---|
v11-0001-wake-up-logical-workers-as-needed-instead-of-rel.patch | application/octet-stream | 6.4 KB |
v11-0002-Time-delayed-logical-replication-subscriber.patch | application/octet-stream | 70.7 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Richard Guo | 2022-12-22 06:02:42 | An oversight in ExecInitAgg for grouping sets |
Previous Message | Hayato Kuroda (Fujitsu) | 2022-12-22 05:50:03 | RE: Time delayed LR (WAS Re: logical replication restrictions) |