Re: pg_rewind copy so much data

From: Hung Phan <hungphan227(at)gmail(dot)com>
To: Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
Cc: PostgreSQL mailing lists <pgsql-general(at)postgresql(dot)org>
Subject: Re: pg_rewind copy so much data
Date: 2017-09-13 05:21:14
Message-ID: CANHVDh05yHdW6_5UrJ5snN0aQ4SSYMEa9j0e6gg0EbrVECzTjw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Hi,

Thanks for your response. I have just replayed switching master and slave
once again:

- one master and one slave (total size of each server is more than 4GB).
Currently the last log of the slave is "started streaming WAL from primary
at 2/D6000000 on timeline 10".

- stop master, the slave show below logs:
replication terminated by primary server
End of WAL reached on timeline 10 at 2/D69304D0
Invalid record length at 2/D69304D0
could not connect to primary server

- promote the slave:
receive promote request
redo done at 2/D6930460
selected new timeline ID: 11
archive recovery complete
MultiXact member wraparound protections are now enabled
database system is ready to accept connections
autovacuum launcher started

- start and stop old master, then run pg_rewind (all are executed
immediately after promoting the slave). Logs of pg_rewind:
servers diverged at WAL position 2/D69304D0 on timeline 10
rewinding from last common checkpoint at 2/D6930460 on timeline 10
reading source file list
reading target file list
reading WAL in target
need to copy 4168 MB (total source directory is 4186 MB)
4268372/4268372 kB (100%) copied
creating backup label and updating control file
syncing target data directory
Done!

If I run pg_rewind with debug option, it just show additional bunch of
files copied in directories like base or pg_tblspc. I claim that there is
no data inserted of modified from the first step. The only difference
between two server is caused by restarting old master.

Thanks and Regards,

Hung Phan

On Wed, Sep 13, 2017 at 10:48 AM, Michael Paquier <michael(dot)paquier(at)gmail(dot)com
> wrote:

> On Wed, Sep 13, 2017 at 12:41 PM, Hung Phan <hungphan227(at)gmail(dot)com> wrote:
> > I have tested pg_rewind (ver 9.5) with the following scenario:
> >
> > - one master and one slave (total size of each server is more than 4GB)
> > - set wal_log_hint=on and restart both
> > - stop master, promote slave
> > - start old master again (now two servers have diverged)
> > - stop old master, run pg_rewind with progress option
>
> That's a good flow. Don't forget to run a manual checkpoint after
> promotion to update the control file of the promoted standby so as
> pg_rewind is able to identify the timeline difference between the
> source and the target servers.
>
> > The pg_rewind ran successfully but I saw it copied more than 4GB
> (4265891 kB
> > copied). So I wonder there was very minor difference between two servers
> but
> > why did pg_rewind copy almost all data of new master?
>
> Without knowing exactly the list of things that have been registered
> as things to copy from the active source to the target, it is hard to
> give a conclusion. But my bet here is that you let the target server
> online long enough that it had a bunch of block updated, causing more
> relation blocks to be copied from the source because more efforts
> would be needed to re-sync it. That's only an assumption without data
> with clear numbers, numbers that could be found using the --debug
> messages of pg_rewind.
> --
> Michael
>

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Yogesh Sharma 2017-09-13 05:34:32 Re: Perl script is killed by SIGPIPE
Previous Message Michael Paquier 2017-09-13 03:48:53 Re: pg_rewind copy so much data