Re: Logical Replication speed-up initial data

From: Avinash Kumar <avinash(dot)vallarapu(at)gmail(dot)com>
To: Vijaykumar Jain <vijaykumarjain(dot)github(at)gmail(dot)com>
Cc: Nikhil Shetty <nikhil(dot)dba04(at)gmail(dot)com>, Hüseyin Demir <demirhuseyinn(dot)94(at)gmail(dot)com>, Pgsql Performance <pgsql-performance(at)lists(dot)postgresql(dot)org>
Subject: Re: Logical Replication speed-up initial data
Date: 2021-08-05 15:11:50
Message-ID: CAN0TujcvY8+A08Nd63YQBDB_H6OYgmYNJ+Q21jrFNiG16a49tw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Hi,

On Thu, Aug 5, 2021 at 11:28 AM Vijaykumar Jain <
vijaykumarjain(dot)github(at)gmail(dot)com> wrote:

> On Thu, 5 Aug 2021 at 10:27, Nikhil Shetty <nikhil(dot)dba04(at)gmail(dot)com> wrote:
>
>> Hi,
>>
>> Thank you for the suggestion.
>>
>> We tried by dropping indexes and it worked faster compared to what we saw
>> earlier. We wanted to know if anybody has done any other changes that helps
>> speed-up initial data load without dropping indexes.
>>
>>
> You could leverage pg_basbeackup or pg_dump with parallel jobs
taken from a Standby (preferably replication paused if pg_dump, anyways
pg_basebackup should be straight-forward) or taken even from
Primary, for the purpose of initial data load.

As you are able to drop indexes and make some schema changes, I would
assume that you could pause your app temporarily. If that's the case
you may look into the simple steps i am posting here that demonstrates
pg_dump/pg_restore instead.

If you cannot pause the app, then, you could look into how you
could use pg_replication_origin_advance
<https://www.postgresql.org/docs/13/functions-admin.html#PG-REPLICATION-ORIGIN-ADVANCE>

Step 1 : Pause App
Step 2 : Create Publication on the Primary CREATE PUBLICATION
<some_pub_name> FOR ALL TABLES;
Step 3 : Create Logical Replication Slot on the Primary SELECT * FROM
pg_create_logical_replication_slot('<some_slot_name>', 'pgoutput'); Step 4
: Create Subscription but do not enable the Subscription
CREATE SUBSCRIPTION <some_sub_name> CONNECTION
'host=<some_host> dbname=<some_db> user=postgres
password=secret port=5432' PUBLICATION <some_pub_name>
WITH (copy_data = false, create_slot=false, enabled=false,
slot_name=<some_slot_name>);

Step 5 : Initiate pg_dump. We can take a parallel backup for a faster
restore.

$ pg_dump -d <some_db> -Fd -j 4 -n <some_schema> -f <some_unique_directory>
-- If its several hundreds of GBs or TBs, you may rather utilize one of
your Standby that has been paused from replication using -> select
pg_wal_replay_pause();

Step 6 : Don't need to wait until pg_dump completes, you may start the App.
-- Hope the app does not perform changes that impact the pg_dump or
gets blocked due to pg_dump.
Step 7 : Restore the dump if you used pg_dump.
pg_restore -d <some_db> -j <some_numer_of_parallel_jobs> <some_directory> Step
8 : Enable subscription.
ALTER SUBSCRIPTION <some_sub_name> ENABLE;

If you have not stopped your app then you must advance the lsn using
pg_replication_origin_advance
<https://www.postgresql.org/docs/13/functions-admin.html#PG-REPLICATION-ORIGIN-ADVANCE>

These are all hand-written steps while drafting this email, so,
please test it on your end as some typos or adjustments are definitely
expected.

PS: i have not tested this in production level loads, it was just some exp
> i did on my laptop.
>
> one option would be to use pglogical extension (this was shared by
> Dharmendra in one the previous mails, sharing the same),
> and then use pglogical_create_subscriber cli to create the initial copy
> via pgbasebackup and then carry on from there.
> I ran the test case similar to one below in my local env, and it seems to
> work fine. of course i do not have TB worth of load to test, but it looks
> promising,
> especially since they introduced it to the core.
> pglogical/010_pglogical_create_subscriber.pl at REL2_x_STABLE ·
> 2ndQuadrant/pglogical (github.com)
> <https://github.com/2ndQuadrant/pglogical/blob/REL2_x_STABLE/t/010_pglogical_create_subscriber.pl>
> Once you attain some reasonable sync state, you can drop the pglogical
> extension, and check if things continue fine.
> I have done something similar when upgrading from 9.6 to 11 using
> pglogical and then dropping the extension and it was smooth,
> maybe you need to try this out and share if things works fine.
> and
> The 1-2-3 for PostgreSQL Logical Replication Using an RDS Snapshot -
> Percona Database Performance Blog
> <https://www.percona.com/blog/postgresql-logical-replication-using-an-rds-snapshot/>
>
>

--
Regards,
Avinash Vallarapu (Avi)
CEO,
MigOps, Inc.

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Nikhil Shetty 2021-08-05 18:41:20 Re: Logical Replication speed-up initial data
Previous Message Vijaykumar Jain 2021-08-05 14:27:42 Re: Logical Replication speed-up initial data