From: | "Drouvot, Bertrand" <bertranddrouvot(dot)pg(at)gmail(dot)com> |
---|---|
To: | Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> |
Cc: | shveta malik <shveta(dot)malik(at)gmail(dot)com>, Peter Eisentraut <peter(dot)eisentraut(at)enterprisedb(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, Ashutosh Sharma <ashu(dot)coek88(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Synchronizing slots from primary to standby |
Date: | 2023-08-04 09:14:33 |
Message-ID: | d0372b74-d7c5-0216-bc0f-23439eb56579@gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Hi,
On 7/28/23 4:39 PM, Bharath Rupireddy wrote:
> On Mon, Jul 24, 2023 at 9:00 AM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
>>
>>> 2. All candidate standbys will start one slot sync worker per logical
>>> slot which might not be scalable.
>>
>> Yeah, that doesn't sound like a good idea but IIRC, the proposed patch
>> is using one worker per database (for all slots corresponding to a
>> database).
>
> Right. It's based on one worker for each database.
>
>>> Is having one (or a few more - not
>>> necessarily one for each logical slot) worker for all logical slots
>>> enough?
>>
>> I guess for a large number of slots the is a possibility of a large
>> gap in syncing the slots which probably means we need to retain
>> corresponding WAL for a much longer time on the primary. If we can
>> prove that the gap won't be large enough to matter then this would be
>> probably worth considering otherwise, I think we should find a way to
>> scale the number of workers to avoid the large gap.
>
> I think the gap is largely determined by the time taken to advance
> each slot and the amount of WAL that each logical slot moves ahead on
> primary.
Sorry to be late, but I gave a second thought and I wonder if we really need this design.
(i.e start a logical replication background worker on the standby to sync the slots).
Wouldn't that be simpler to "just" update the sync slots "metadata"
as the https://github.com/EnterpriseDB/pg_failover_slots module (mentioned by Peter
up-thread) is doing?
(making use of LogicalConfirmReceivedLocation(), LogicalIncreaseXminForSlot()
and LogicalIncreaseRestartDecodingForSlot(), If I read synchronize_one_slot() correctly).
> I've measured the time it takes for
> pg_logical_replication_slot_advance with different amounts WAL on my
> system. It took 2595ms/5091ms/31238ms to advance the slot by
> 3.7GB/7.3GB/13GB respectively. To put things into perspective here,
> imagine there are 3 logical slots to sync for a single slot sync
> worker and each of them are in need of advancing the slot by
> 3.7GB/7.3GB/13GB of WAL. The slot sync worker gets to slot 1 again
> after 2595ms+5091ms+31238ms (~40sec), gets to slot 2 again after
> advance time of slot 1 with amount of WAL that the slot has moved
> ahead on primary during 40sec, gets to slot 3 again after advance time
> of slot 1 and slot 2 with amount of WAL that the slot has moved ahead
> on primary and so on. If WAL generation on the primary is pretty fast,
> and if the logical slot moves pretty fast on the primary, the time it
> takes for a single sync worker to sync a slot can increase.
That would be way "faster" and we would probably not need to
worry that much about the number of "sync" workers (if it/they "just" has/have
to sync slot's "metadata") as proposed above.
Thoughts?
Regards,
--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com
From | Date | Subject | |
---|---|---|---|
Next Message | Drouvot, Bertrand | 2023-08-04 09:17:45 | Re: Split index and table statistics into different types of stats |
Previous Message | tender wang | 2023-08-04 09:04:29 | Re: [BUG] Fix DETACH with FK pointing to a partitioned table fails |