Re: Fast logical replication jump start with PG 10

From: Adrian Klaver <adrian(dot)klaver(at)aklaver(dot)com>
To: Olivier Gautherot <olivier(at)gautherot(dot)net>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: Fast logical replication jump start with PG 10
Date: 2018-05-26 18:00:33
Message-ID: a01d4eeb-21c5-d235-a18c-fe476d2db6a9@aklaver.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On 05/26/2018 06:23 AM, Olivier Gautherot wrote:
> Hi Adrian!
>
> On Sat, May 26, 2018 at 12:11 AM, Adrian Klaver
> <adrian(dot)klaver(at)aklaver(dot)com <mailto:adrian(dot)klaver(at)aklaver(dot)com>> wrote:
>
> On 05/25/2018 06:35 PM, Olivier Gautherot wrote:
>
> Hi Adrian, thanks for your reply. Here is the clarification.
>
> 1) It is indeed a pg_upgrade from 9.2 to 10.4. Depending on the
> test machine, it runs in between 15 and 20 minutes for just over
> 100GB. I can negotiate this time with our customer. The vacuum
> process took another 5 to 7 minutes. This this what I was
> referring to with the 30 minutes (point 3 in your questions)
>
> 2) After pg_upgrade, I published the tables on the database (in
> the sense "CREATE DATABASE") and subscribed to this publication
> on the second server (logical replication). The data copy
> processed started immediately and took around 1 hour. I then
> loaded the indexes, what took > another 2h20m. At that point the
> active-passive cluster was ready to go.
>
>
> The index creation was done on the replicated machine I presume,
> using what command?
>
>
> The sequence on the replicated machine was (pseudo-code to simplify the
> syntax):
> - pg_dump --section=pre-data -h master_machine master_database | psql -h
> replication_machine replication_database
> # This took seconds, "pre-data" discards the indexes
>
> - psql -h replication_machine -c "CREATE SUBSCRIPTION mysub CONNECTION
> "..." PUBLICATION mypub;" replication_database
> # This took about 1 hour for the initial sync
>
> - pg_dump --section=post-data -h master_machine master_database | psql
> -h replication_machine replication_database
> # This took 2h20m to load the various indexes
>
> This sequence follows the recommendation of section 14.4.3 in
> https://www.postgresql.org/docs/10/static/populate.html . If I stick to
> streaming as we do today (e.g. pg_upgrade and then rsync to the
> replication server), I can be ready in about 1 hour (more acceptable for
> the customer).
>

Just realized that by setting up the streaming as above you are already
doing basically the same thing as I suggested in my previous post.
Streaming and logical replication can exist at the same time:

https://www.postgresql.org/docs/10/static/logical-replication.html

"Logical replication is a method of replicating data objects and their
changes, based upon their replication identity (usually a primary key).
We use the term logical in contrast to physical replication, which uses
exact block addresses and byte-by-byte replication. PostgreSQL supports
both mechanisms concurrently, see Chapter 26. Logical replication allows
fine-grained control over both data replication and security."

So you could set up the logical replication after the streaming is done
using the copy_data=false clause and been done in a relatively short
period of time. At that point you could decide whether to keep the
streaming running or not.

--
Adrian Klaver
adrian(dot)klaver(at)aklaver(dot)com

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Martín Marqués 2018-05-26 18:13:45 Re: Fast logical replication jump start with PG 10
Previous Message Adrian Klaver 2018-05-26 17:27:51 Re: Fast logical replication jump start with PG 10