Re: Logical replication performance

From: Flavio Henrique Araque Gurgel <fhagur(at)gmail(dot)com>
To: Florian Philippon <florian(dot)philippon(at)doctolib(dot)com>
Cc: pgsql-performance(at)lists(dot)postgresql(dot)org
Subject: Re: Logical replication performance
Date: 2019-11-29 16:25:43
Message-ID: CAGHTAeORhnN99pcZ=6_GDsJt6TqTeqki8rqW10h9cz0+3Us8=Q@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Em sex., 29 de nov. de 2019 às 17:06, Florian Philippon <
florian(dot)philippon(at)doctolib(dot)com> escreveu:

> Hello community!
>
> Hi Florian

> We are currently testing PostgreSQL 11's built-in logical replication. We
> are trying to initialize a subscriber (from scratch) from a publisher with
> a large database (+6TB) with around 220 tables.
>
> We tweaked the configuration parameters below, both on publisher and
> subscriber, in order to minimize the initial copy data phase delay:
>
> - max_replication_slots
> - max_wal_senders
> - max_wal_size
> - max_worker_processes
> - max_logical_replication_workers
> - max_sync_workers_per_subscription
> - max_worker_processes
>
> The two PostgreSQL instances are using the same hardware: 48 vCPU, 384 GB
> ram, 10GB network and same version of software (PostgreSQL 11.6).
>
> We pre-loaded the full schema of the database (with indexes and
> constraints) on the subscriber since it's mandatory to have the logical
> replication working.
>
> However, the initial copy data phase is quite long (+2 days and still
> running) for largest tables in the database. There is no load on the
> publisher since it's a staging environment.
> We noticed that logical replication workers processes on the subscriber
> can reach more than 90% CPU usage per worker.
>
> We understand that we cannot have more than one worker per table running
> but we would like to know if there is anything that could help us to
> achieve this initial copy phase more quickly.
>
> We tried another solution: we loaded a minimal schema (without indexes and
> constraints) on the subscriber and created the subscription. The initial
> copy phase was way faster (a few hours). Then we created indexes and
> constraints. Is this a suitable solution for production? Will the logical
> replication flow be buffered by the replication slots during index creation
> and get in sync afterwards or will it conflict due to locking issues?
>
>
You can try the pg_dump over a snapshot and use parallel restore
(pg_restore -j option) to your initial data load, it should be much faster
than an initial sync. Take a look here:
https://www.postgresql.org/docs/11/logicaldecoding-explanation.html#id-1.8.14.8.5

Best,
Flavio

In response to

Browse pgsql-performance by date

  From Date Subject
Next Message Eugene Podshivalov 2019-11-30 19:47:02 Re: Considerable performance downgrade of v11 and 12 on Windows
Previous Message Florian Philippon 2019-11-29 16:06:34 Logical replication performance