From: | Kartyshov Ivan <i(dot)kartyshov(at)postgrespro(dot)ru> |
---|---|
To: | pgsql-hackers(at)lists(dot)postgresql(dot)org |
Subject: | Re: [HACKERS] make async slave to wait for lsn to be replayed |
Date: | 2024-03-22 19:21:20 |
Message-ID: | 8b67e6514c9ac3562ed675393208506a@postgrespro.ru |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Thank you for your feedback.
On 2024-03-20 12:11, Alexander Korotkov wrote:
> On Wed, Mar 20, 2024 at 12:34 AM Kartyshov Ivan
> <i(dot)kartyshov(at)postgrespro(dot)ru> wrote:
>> > 4.2 With an unreasonably high future LSN, BEGIN command waits
>> > unboundedly, shouldn't we check if the specified LSN is more than
>> > pg_last_wal_receive_lsn() error out?
>
> I think limiting wait lsn by current received lsn would destroy the
> whole value of this feature. The value is to wait till given LSN is
> replayed, whether it's already received or not.
Ok sounds reasonable, I`ll rollback the changes.
> But I don't see a problem here. On the replica, it's out of our
> control to check which lsn is good and which is not. We can't check
> whether the lsn, which is in future for the replica, is already issued
> by primary.
>
> For the case of wrong lsn, which could cause potentially infinite
> wait, there is the timeout and the manual query cancel.
Fully agree with this take.
>> > 4.3 With an unreasonably high wait time, BEGIN command waits
>> > unboundedly, shouldn't we restrict the wait time to some max
> value,
>> > say a day or so?
>> > SELECT pg_last_wal_receive_lsn() + 1 AS future_receive_lsn \gset
>> > BEGIN AFTER :'future_receive_lsn' WITHIN 100000;
>>
>> Good idea, I put it 1 day. But this limit we should to discuss.
>
> Do you think that specifying timeout in milliseconds is suitable? I
> would prefer to switch to seconds (with ability to specify fraction of
> second). This was expressed before by Alexander Lakhin.
It sounds like an interesting idea. Please review the result.
>> > https://github.com/macdice/redo-bench or similar tools?
>
> Ivan, could you do this?
Yes, test redo-bench/crash-recovery.sh
This patch on master
91.327, 1.973
105.907, 3.338
98.412, 4.579
95.818, 4.19
REL_13-STABLE
116.645, 3.005
113.212, 2.568
117.644, 3.183
111.411, 2.782
master
124.712, 2.047
117.012, 1.736
116.328, 2.035
115.662, 1.797
Strange behavior, patched version is faster then REL_13-STABLE and
master.
> I don't see this change in the patch. Normally if a process gets a
> signal, that causes WaitLatch() to exit immediately. It also exists
> immediately on query cancel. IIRC, this 1 minute timeout is needed to
> handle some extreme cases when an interrupt is missing. Other places
> have it equal to 1 minute. I don't see why we should have it
> different.
Ok, I`ll rollback my changes.
>> 4) added and expanded sections in the documentation
>
> I don't see this in the patch. I see only a short description in
> func.sgml, which is definitely not sufficient. We need at least
> everything we have in the docs before to be adjusted with the current
> approach of procedure.
I didn't find another section where to add the description of
pg_wait_lsn().
So I extend description on the bottom of the table.
>> 5) add default variant of timeout
>> pg_wait_lsn(trg_lsn pg_lsn, delay int8 DEFAULT 0)
>> example: pg_wait_lsn('0/31B1B60') equal pg_wait_lsn('0/31B1B60', 0)
>
> Does zero here mean no timeout? I think this should be documented.
> Also, I would prefer to see the timeout by default. Probably one
> minute would be good for default.
Lets discuss this point. Loop in function WaitForLSN is made that way,
if we choose delay=0, only then we can wait infinitely to wait LSN
without timeout. So default must be 0.
Please take one more look on the patch.
--
Ivan Kartyshov
Postgres Professional: www.postgrespro.com
Attachment | Content-Type | Size |
---|---|---|
v14_0001-Procedure-wait-lsn.patch | text/x-diff | 18.4 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Andres Freund | 2024-03-22 19:31:22 | Re: Cannot find a working 64-bit integer type on Illumos |
Previous Message | Robert Haas | 2024-03-22 19:20:41 | Re: documentation structure |