Re: Logical Replication speed-up initial data

From: Nikhil Shetty <nikhil(dot)dba04(at)gmail(dot)com>
To: Avinash Kumar <avinash(dot)vallarapu(at)gmail(dot)com>
Cc: Vijaykumar Jain <vijaykumarjain(dot)github(at)gmail(dot)com>, Hüseyin Demir <demirhuseyinn(dot)94(at)gmail(dot)com>, Pgsql Performance <pgsql-performance(at)lists(dot)postgresql(dot)org>
Subject: Re: Logical Replication speed-up initial data
Date: 2021-08-05 18:41:20
Message-ID: CAFpL5Vx3e8Axz5vvSAAJ45xTtMwC1sXZ6T7TYtwZ0XnNPyVwtQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Hi Avinash,

Thank you for the detailed explanation.

Indexes were dropped on the destination to increase initial data load
speed. We cannot stop the App on source and it is highly transactional.
I had thought about this method but I am not sure after the pg_restore from
where the logical replication will be started, we cannot afford to lose any
data.

I will give this method a test though and check how it works.

Thanks,
Nikhil

On Thu, Aug 5, 2021 at 8:42 PM Avinash Kumar <avinash(dot)vallarapu(at)gmail(dot)com>
wrote:

> Hi,
>
> On Thu, Aug 5, 2021 at 11:28 AM Vijaykumar Jain <
> vijaykumarjain(dot)github(at)gmail(dot)com> wrote:
>
>> On Thu, 5 Aug 2021 at 10:27, Nikhil Shetty <nikhil(dot)dba04(at)gmail(dot)com>
>> wrote:
>>
>>> Hi,
>>>
>>> Thank you for the suggestion.
>>>
>>> We tried by dropping indexes and it worked faster compared to what we
>>> saw earlier. We wanted to know if anybody has done any other changes that
>>> helps speed-up initial data load without dropping indexes.
>>>
>>>
>> You could leverage pg_basbeackup or pg_dump with parallel jobs
> taken from a Standby (preferably replication paused if pg_dump, anyways
> pg_basebackup should be straight-forward) or taken even from
> Primary, for the purpose of initial data load.
>
> As you are able to drop indexes and make some schema changes, I would
> assume that you could pause your app temporarily. If that's the case
> you may look into the simple steps i am posting here that demonstrates
> pg_dump/pg_restore instead.
>
> If you cannot pause the app, then, you could look into how you
> could use pg_replication_origin_advance
> <https://www.postgresql.org/docs/13/functions-admin.html#PG-REPLICATION-ORIGIN-ADVANCE>
>
>
> Step 1 : Pause App
> Step 2 : Create Publication on the Primary CREATE PUBLICATION
> <some_pub_name> FOR ALL TABLES;
> Step 3 : Create Logical Replication Slot on the Primary SELECT * FROM
> pg_create_logical_replication_slot('<some_slot_name>', 'pgoutput'); Step
> 4 : Create Subscription but do not enable the Subscription
> CREATE SUBSCRIPTION <some_sub_name> CONNECTION
> 'host=<some_host> dbname=<some_db> user=postgres
> password=secret port=5432' PUBLICATION <some_pub_name>
> WITH (copy_data = false, create_slot=false, enabled=false,
> slot_name=<some_slot_name>);
>
> Step 5 : Initiate pg_dump. We can take a parallel backup for a faster
> restore.
>
> $ pg_dump -d <some_db> -Fd -j 4 -n <some_schema> -f <some_unique_directory>
> -- If its several hundreds of GBs or TBs, you may rather utilize one of
> your Standby that has been paused from replication using -> select pg_wal_replay_pause();
>
> Step 6 : Don't need to wait until pg_dump completes, you may start the
> App.
> -- Hope the app does not perform changes that impact the pg_dump or
> gets blocked due to pg_dump.
> Step 7 : Restore the dump if you used pg_dump.
> pg_restore -d <some_db> -j <some_numer_of_parallel_jobs> <some_directory> Step
> 8 : Enable subscription.
> ALTER SUBSCRIPTION <some_sub_name> ENABLE;
>
> If you have not stopped your app then you must advance the lsn using
> pg_replication_origin_advance
> <https://www.postgresql.org/docs/13/functions-admin.html#PG-REPLICATION-ORIGIN-ADVANCE>
>
>
> These are all hand-written steps while drafting this email, so,
> please test it on your end as some typos or adjustments are definitely
> expected.
>
> PS: i have not tested this in production level loads, it was just some exp
>> i did on my laptop.
>>
>> one option would be to use pglogical extension (this was shared by
>> Dharmendra in one the previous mails, sharing the same),
>> and then use pglogical_create_subscriber cli to create the initial copy
>> via pgbasebackup and then carry on from there.
>> I ran the test case similar to one below in my local env, and it seems to
>> work fine. of course i do not have TB worth of load to test, but it looks
>> promising,
>> especially since they introduced it to the core.
>> pglogical/010_pglogical_create_subscriber.pl at REL2_x_STABLE ·
>> 2ndQuadrant/pglogical (github.com)
>> <https://github.com/2ndQuadrant/pglogical/blob/REL2_x_STABLE/t/010_pglogical_create_subscriber.pl>
>> Once you attain some reasonable sync state, you can drop the pglogical
>> extension, and check if things continue fine.
>> I have done something similar when upgrading from 9.6 to 11 using
>> pglogical and then dropping the extension and it was smooth,
>> maybe you need to try this out and share if things works fine.
>> and
>> The 1-2-3 for PostgreSQL Logical Replication Using an RDS Snapshot -
>> Percona Database Performance Blog
>> <https://www.percona.com/blog/postgresql-logical-replication-using-an-rds-snapshot/>
>>
>>
>
> --
> Regards,
> Avinash Vallarapu (Avi)
> CEO,
> MigOps, Inc.
>

In response to

Browse pgsql-performance by date

  From Date Subject
Next Message Nikhil Shetty 2021-08-05 18:45:15 Re: Logical Replication speed-up initial data
Previous Message Avinash Kumar 2021-08-05 15:11:50 Re: Logical Replication speed-up initial data