Re: Synchronize the dump with a logical slot with --snapshot

From: Justin <zzzzz(dot)graf(at)gmail(dot)com>
To: Durgamahesh Manne <maheshpostgres9(at)gmail(dot)com>
Cc: pgsql-in-general(at)postgresql(dot)org
Subject: Re: Synchronize the dump with a logical slot with --snapshot
Date: 2024-09-28 20:27:19
Message-ID: CALL-XeOY0pd8TcC2_aZ6T3qZfgV9cQkkDO8ZZmmRszXHU8qj+w@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general pgsql-in-general

On Sat, Sep 28, 2024 at 3:45 PM Durgamahesh Manne <maheshpostgres9(at)gmail(dot)com>
wrote:

> Hi Justin
>
> We will have to take dump with --snapshot after creating slot then restore
> it on target during which we need to create subscription in disable mode
> with copy_data = false post which make origin advance to progress
> replication finally can go with enable the subscription
>
> Generate snapshot with pg export snapshot function and allocate this to
> slot with commands is very important and critical here to go for next steps
>
> Regards,
> Durga Mahesh
>
> On Sun, 29 Sept, 2024, 00:22 Durgamahesh Manne, <maheshpostgres9(at)gmail(dot)com>
> wrote:
>
>> Hi justin
>>
>> I raised question based on the reference link
>> https://opensource-db.com/unlocking-initial-sync-for-logical-replication-in-aws-rds-for-postgresql/
>> .. you can also go through it to see the steps
>> This worked on postgres 10version but on postgres 14 I can go through the
>> info you provided to implement the same
>>
>> Thanks for your valuable information
>>
>> Regards,
>> Durga Mahesh
>>
>>
>> On Sat, 28 Sept, 2024, 23:10 Justin, <zzzzz(dot)graf(at)gmail(dot)com> wrote:
>>
>>>
>>>
>>> On Sat, Sep 28, 2024 at 1:04 AM Durgamahesh Manne <
>>> maheshpostgres9(at)gmail(dot)com> wrote:
>>>
>>>> Hi Team
>>>>
>>>> Can anyone respond to my question from respected team members ?
>>>>
>>>> Durga Mahesh
>>>>
>>>> On Thu, Sep 26, 2024 at 2:23 AM Durgamahesh Manne <
>>>> maheshpostgres9(at)gmail(dot)com> wrote:
>>>>
>>>>> Hi Team
>>>>>
>>>>> --snapshot=snapshotname
>>>>> (Use the specified synchronized snapshot when making a dump of the
>>>>> database
>>>>>
>>>>> This option is useful when needing to synchronize the dump with a
>>>>> logical replication slot) as per the pgdg
>>>>>
>>>>> How do we synchronize the dump with a logical replication slot with
>>>>> --snapshot?
>>>>>
>>>>> I am using the postgresql 14 version which supports only
>>>>> pg_create_logical_replication_slot. How to generate a internal snapshot
>>>>> with it ?
>>>>>
>>>>> Below CREATE_REPLICAION_SLOT not supported by postgresql 14
>>>>>
>>>>> example osdb_lsr=# CREATE_REPLICATION_SLOT lsr_sync_01 LOGICAL
>>>>> pgoutput;
>>>>> slot_name | consistent_point | snapshot_name | output_plugin
>>>>> -------------+------------------+---------------------+---------------
>>>>> lsr_sync_01 | 0/C000110 | 00000003-00000002-1 | pgoutput
>>>>>
>>>>> Regards,
>>>>>
>>>>> Durga Mahesh
>>>>>
>>>>>
>>>>>
>>>>> On Fri, 20 Sept, 2024, 01:27 Durgamahesh Manne, <
>>>>> maheshpostgres9(at)gmail(dot)com> wrote:
>>>>>
>>>>>> Hi Team
>>>>>>
>>>>>> --snapshot=*snapshotname*
>>>>>>
>>>>>> (Use the specified synchronized snapshot when making a dump of the
>>>>>> database
>>>>>>
>>>>>> This option is useful when needing to synchronize the dump with a
>>>>>> logical replication slot) as per the pgdg
>>>>>>
>>>>>> How do we synchronize the dump with a logical replication slot
>>>>>> with --snapshot?
>>>>>>
>>>>>> I am using the postgresql 14 version which supports only
>>>>>> pg_create_logical_replication_slot. How to generate a snapshot with it ?
>>>>>>
>>>>>> Below CREATE_REPLICAION_SLOT not supported by postgresql 14
>>>>>>
>>>>>> example osdb_lsr=# CREATE_REPLICATION_SLOT lsr_sync_01 LOGICAL
>>>>>> pgoutput;
>>>>>> slot_name | consistent_point | snapshot_name | output_plugin
>>>>>> -------------+------------------+---------------------+---------------
>>>>>> lsr_sync_01 | 0/C000110 | 00000003-00000002-1 | pgoutput
>>>>>>
>>>>>> Regards,
>>>>>>
>>>>>> Durga Mahesh
>>>>>>
>>>>>
>>>
>>> Hi Durgamahesh,
>>>
>>> I am not sure what you are after with matching pg_dump and replication
>>> slot together unless you are trying to get a dump to handle the initial
>>> data sync. There is not a benefit to doing that as the WAL is going to
>>> build up on the publisher...
>>>
>>> You have to create a snapshot using the export function
>>>
>>> https://www.postgresql.org/docs/current/sql-set-transaction.html
>>>
>>>
>>> https://www.postgresql.org/docs/17/functions-admin.html#FUNCTIONS-SNAPSHOT-SYNCHRONIZATION
>>>
>>> Then you can create the logical replication slot with using that
>>> slotname option
>>>
>>> https://www.postgresql.org/docs/17/sql-createsubscription.html#SQL-CREATESUBSCRIPTION-PARAMS-WITH-SLOT-NAME
>>> and no sync option.
>>>
>>> Then you tell pg_dump to use that snapshot name snapshot with this option
>>> --snapshot=snapshotname
>>>
>>> https://www.postgresql.org/docs/current/app-pgdump.html
>>>
>>> Once pg_restore is done on the destination , you can create a
>>> subscription using that slotname option probably and specify copy_data =
>>> false.
>>>
>>> Keep in mind the WAL will build up during this process, not sure what
>>> the benefit would be just allowing logical replication to do the initial
>>> sync.
>>>
>>>
>>> Thanks
>>> Justin
>>>
>>
Hi Durgamahesh,

The article does not state what is being gained following these
instructions over just letting LR do the initial sync...

And using pg_baseback to create the initial data set along with all these
moving pieces. This article made this way harder than it has to be,
create a read replica, using pg_basebackup, then convert it to an LR. PG
17 now includes a binary to do this for us. I have converted a Read
Replica to LR replica several times in just 4 steps.

I don't see why using pg_basebackup is a better solution than letting the
initial sync handle the data copy.

Using pg_dump to do initial data copy over LR initial sync what is being
gained???

Don't forget the WAL will be building up on the publisher using
pg_basebackup and pg_dump as the LR slot has to exist..

I have done 50 TB size databases with LR...

I would not be following these instructions the assertions are not true

*Snapshot: Copies all data at once. Simple but slow for large datasets. *This
is not how LR initial sync works and can increase the number of sync
workers to increase speed. On very large tables drop the indexes that are
not the replica identity and recreate them after the table is in sync. I
only do this when a specific table will take several days to copy over
because it is several TB in size..

Base Backup and Continuous Archiving*: Makes a base copy and then tracks
changes. More suitable for large datasets. ---*Don't know how this is
true, Data size is data size and the time to copy does not really change.
There is a tiny saving by not having to recreate the indexes, is
this approach really worth it. Lots of manual moving pieces and making
sure both databases are on the same LSN...

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message yudhi s 2024-09-28 21:37:17 Context variable in application and trigger code
Previous Message Durgamahesh Manne 2024-09-28 19:45:24 Re: Synchronize the dump with a logical slot with --snapshot

Browse pgsql-in-general by date

  From Date Subject
Next Message Durgamahesh Manne 2024-09-29 04:13:44 Re: Synchronize the dump with a logical slot with --snapshot
Previous Message Durgamahesh Manne 2024-09-28 19:45:24 Re: Synchronize the dump with a logical slot with --snapshot