From: | Nikhil Shetty <nikhil(dot)dba04(at)gmail(dot)com> |
---|---|
To: | Avinash Kumar <avinash(dot)vallarapu(at)gmail(dot)com> |
Cc: | Vijaykumar Jain <vijaykumarjain(dot)github(at)gmail(dot)com>, Hüseyin Demir <demirhuseyinn(dot)94(at)gmail(dot)com>, Pgsql Performance <pgsql-performance(at)lists(dot)postgresql(dot)org> |
Subject: | Re: Logical Replication speed-up initial data |
Date: | 2021-08-05 18:41:20 |
Message-ID: | CAFpL5Vx3e8Axz5vvSAAJ45xTtMwC1sXZ6T7TYtwZ0XnNPyVwtQ@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-performance |
Hi Avinash,
Thank you for the detailed explanation.
Indexes were dropped on the destination to increase initial data load
speed. We cannot stop the App on source and it is highly transactional.
I had thought about this method but I am not sure after the pg_restore from
where the logical replication will be started, we cannot afford to lose any
data.
I will give this method a test though and check how it works.
Thanks,
Nikhil
On Thu, Aug 5, 2021 at 8:42 PM Avinash Kumar <avinash(dot)vallarapu(at)gmail(dot)com>
wrote:
> Hi,
>
> On Thu, Aug 5, 2021 at 11:28 AM Vijaykumar Jain <
> vijaykumarjain(dot)github(at)gmail(dot)com> wrote:
>
>> On Thu, 5 Aug 2021 at 10:27, Nikhil Shetty <nikhil(dot)dba04(at)gmail(dot)com>
>> wrote:
>>
>>> Hi,
>>>
>>> Thank you for the suggestion.
>>>
>>> We tried by dropping indexes and it worked faster compared to what we
>>> saw earlier. We wanted to know if anybody has done any other changes that
>>> helps speed-up initial data load without dropping indexes.
>>>
>>>
>> You could leverage pg_basbeackup or pg_dump with parallel jobs
> taken from a Standby (preferably replication paused if pg_dump, anyways
> pg_basebackup should be straight-forward) or taken even from
> Primary, for the purpose of initial data load.
>
> As you are able to drop indexes and make some schema changes, I would
> assume that you could pause your app temporarily. If that's the case
> you may look into the simple steps i am posting here that demonstrates
> pg_dump/pg_restore instead.
>
> If you cannot pause the app, then, you could look into how you
> could use pg_replication_origin_advance
> <https://www.postgresql.org/docs/13/functions-admin.html#PG-REPLICATION-ORIGIN-ADVANCE>
>
>
> Step 1 : Pause App
> Step 2 : Create Publication on the Primary CREATE PUBLICATION
> <some_pub_name> FOR ALL TABLES;
> Step 3 : Create Logical Replication Slot on the Primary SELECT * FROM
> pg_create_logical_replication_slot('<some_slot_name>', 'pgoutput'); Step
> 4 : Create Subscription but do not enable the Subscription
> CREATE SUBSCRIPTION <some_sub_name> CONNECTION
> 'host=<some_host> dbname=<some_db> user=postgres
> password=secret port=5432' PUBLICATION <some_pub_name>
> WITH (copy_data = false, create_slot=false, enabled=false,
> slot_name=<some_slot_name>);
>
> Step 5 : Initiate pg_dump. We can take a parallel backup for a faster
> restore.
>
> $ pg_dump -d <some_db> -Fd -j 4 -n <some_schema> -f <some_unique_directory>
> -- If its several hundreds of GBs or TBs, you may rather utilize one of
> your Standby that has been paused from replication using -> select pg_wal_replay_pause();
>
> Step 6 : Don't need to wait until pg_dump completes, you may start the
> App.
> -- Hope the app does not perform changes that impact the pg_dump or
> gets blocked due to pg_dump.
> Step 7 : Restore the dump if you used pg_dump.
> pg_restore -d <some_db> -j <some_numer_of_parallel_jobs> <some_directory> Step
> 8 : Enable subscription.
> ALTER SUBSCRIPTION <some_sub_name> ENABLE;
>
> If you have not stopped your app then you must advance the lsn using
> pg_replication_origin_advance
> <https://www.postgresql.org/docs/13/functions-admin.html#PG-REPLICATION-ORIGIN-ADVANCE>
>
>
> These are all hand-written steps while drafting this email, so,
> please test it on your end as some typos or adjustments are definitely
> expected.
>
> PS: i have not tested this in production level loads, it was just some exp
>> i did on my laptop.
>>
>> one option would be to use pglogical extension (this was shared by
>> Dharmendra in one the previous mails, sharing the same),
>> and then use pglogical_create_subscriber cli to create the initial copy
>> via pgbasebackup and then carry on from there.
>> I ran the test case similar to one below in my local env, and it seems to
>> work fine. of course i do not have TB worth of load to test, but it looks
>> promising,
>> especially since they introduced it to the core.
>> pglogical/010_pglogical_create_subscriber.pl at REL2_x_STABLE ·
>> 2ndQuadrant/pglogical (github.com)
>> <https://github.com/2ndQuadrant/pglogical/blob/REL2_x_STABLE/t/010_pglogical_create_subscriber.pl>
>> Once you attain some reasonable sync state, you can drop the pglogical
>> extension, and check if things continue fine.
>> I have done something similar when upgrading from 9.6 to 11 using
>> pglogical and then dropping the extension and it was smooth,
>> maybe you need to try this out and share if things works fine.
>> and
>> The 1-2-3 for PostgreSQL Logical Replication Using an RDS Snapshot -
>> Percona Database Performance Blog
>> <https://www.percona.com/blog/postgresql-logical-replication-using-an-rds-snapshot/>
>>
>>
>
> --
> Regards,
> Avinash Vallarapu (Avi)
> CEO,
> MigOps, Inc.
>
From | Date | Subject | |
---|---|---|---|
Next Message | Nikhil Shetty | 2021-08-05 18:45:15 | Re: Logical Replication speed-up initial data |
Previous Message | Avinash Kumar | 2021-08-05 15:11:50 | Re: Logical Replication speed-up initial data |