From: | Avinash Kumar <avinash(dot)vallarapu(at)gmail(dot)com> |
---|---|
To: | Vijaykumar Jain <vijaykumarjain(dot)github(at)gmail(dot)com> |
Cc: | Nikhil Shetty <nikhil(dot)dba04(at)gmail(dot)com>, Hüseyin Demir <demirhuseyinn(dot)94(at)gmail(dot)com>, Pgsql Performance <pgsql-performance(at)lists(dot)postgresql(dot)org> |
Subject: | Re: Logical Replication speed-up initial data |
Date: | 2021-08-05 15:11:50 |
Message-ID: | CAN0TujcvY8+A08Nd63YQBDB_H6OYgmYNJ+Q21jrFNiG16a49tw@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-performance |
Hi,
On Thu, Aug 5, 2021 at 11:28 AM Vijaykumar Jain <
vijaykumarjain(dot)github(at)gmail(dot)com> wrote:
> On Thu, 5 Aug 2021 at 10:27, Nikhil Shetty <nikhil(dot)dba04(at)gmail(dot)com> wrote:
>
>> Hi,
>>
>> Thank you for the suggestion.
>>
>> We tried by dropping indexes and it worked faster compared to what we saw
>> earlier. We wanted to know if anybody has done any other changes that helps
>> speed-up initial data load without dropping indexes.
>>
>>
> You could leverage pg_basbeackup or pg_dump with parallel jobs
taken from a Standby (preferably replication paused if pg_dump, anyways
pg_basebackup should be straight-forward) or taken even from
Primary, for the purpose of initial data load.
As you are able to drop indexes and make some schema changes, I would
assume that you could pause your app temporarily. If that's the case
you may look into the simple steps i am posting here that demonstrates
pg_dump/pg_restore instead.
If you cannot pause the app, then, you could look into how you
could use pg_replication_origin_advance
<https://www.postgresql.org/docs/13/functions-admin.html#PG-REPLICATION-ORIGIN-ADVANCE>
Step 1 : Pause App
Step 2 : Create Publication on the Primary CREATE PUBLICATION
<some_pub_name> FOR ALL TABLES;
Step 3 : Create Logical Replication Slot on the Primary SELECT * FROM
pg_create_logical_replication_slot('<some_slot_name>', 'pgoutput'); Step 4
: Create Subscription but do not enable the Subscription
CREATE SUBSCRIPTION <some_sub_name> CONNECTION
'host=<some_host> dbname=<some_db> user=postgres
password=secret port=5432' PUBLICATION <some_pub_name>
WITH (copy_data = false, create_slot=false, enabled=false,
slot_name=<some_slot_name>);
Step 5 : Initiate pg_dump. We can take a parallel backup for a faster
restore.
$ pg_dump -d <some_db> -Fd -j 4 -n <some_schema> -f <some_unique_directory>
-- If its several hundreds of GBs or TBs, you may rather utilize one of
your Standby that has been paused from replication using -> select
pg_wal_replay_pause();
Step 6 : Don't need to wait until pg_dump completes, you may start the App.
-- Hope the app does not perform changes that impact the pg_dump or
gets blocked due to pg_dump.
Step 7 : Restore the dump if you used pg_dump.
pg_restore -d <some_db> -j <some_numer_of_parallel_jobs> <some_directory> Step
8 : Enable subscription.
ALTER SUBSCRIPTION <some_sub_name> ENABLE;
If you have not stopped your app then you must advance the lsn using
pg_replication_origin_advance
<https://www.postgresql.org/docs/13/functions-admin.html#PG-REPLICATION-ORIGIN-ADVANCE>
These are all hand-written steps while drafting this email, so,
please test it on your end as some typos or adjustments are definitely
expected.
PS: i have not tested this in production level loads, it was just some exp
> i did on my laptop.
>
> one option would be to use pglogical extension (this was shared by
> Dharmendra in one the previous mails, sharing the same),
> and then use pglogical_create_subscriber cli to create the initial copy
> via pgbasebackup and then carry on from there.
> I ran the test case similar to one below in my local env, and it seems to
> work fine. of course i do not have TB worth of load to test, but it looks
> promising,
> especially since they introduced it to the core.
> pglogical/010_pglogical_create_subscriber.pl at REL2_x_STABLE ·
> 2ndQuadrant/pglogical (github.com)
> <https://github.com/2ndQuadrant/pglogical/blob/REL2_x_STABLE/t/010_pglogical_create_subscriber.pl>
> Once you attain some reasonable sync state, you can drop the pglogical
> extension, and check if things continue fine.
> I have done something similar when upgrading from 9.6 to 11 using
> pglogical and then dropping the extension and it was smooth,
> maybe you need to try this out and share if things works fine.
> and
> The 1-2-3 for PostgreSQL Logical Replication Using an RDS Snapshot -
> Percona Database Performance Blog
> <https://www.percona.com/blog/postgresql-logical-replication-using-an-rds-snapshot/>
>
>
--
Regards,
Avinash Vallarapu (Avi)
CEO,
MigOps, Inc.
From | Date | Subject | |
---|---|---|---|
Next Message | Nikhil Shetty | 2021-08-05 18:41:20 | Re: Logical Replication speed-up initial data |
Previous Message | Vijaykumar Jain | 2021-08-05 14:27:42 | Re: Logical Replication speed-up initial data |