Re: improve wals replay on secondary

From: Mariel Cherkassky <mariel(dot)cherkassky(at)gmail(dot)com>
To: Nikolay Samokhvalov <samokhvalov(at)gmail(dot)com>
Cc: Fabio Pardi <f(dot)pardi(at)portavita(dot)eu>, pgsql-performance(at)lists(dot)postgresql(dot)org
Subject: Re: improve wals replay on secondary
Date: 2019-05-29 11:10:59
Message-ID: CA+t6e1=WwbepgtZbYiOMs83LCR3JCeqv1JX1q05cAdLtRtsXfg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

In my case, I'm talking about creating a standby -> standby clone and then
recover the wals . How can I run checkpoint in the secondary if it isnt up ?

‫בתאריך יום ד׳, 29 במאי 2019 ב-13:54 מאת ‪Nikolay Samokhvalov‬‏ <‪
samokhvalov(at)gmail(dot)com‬‏>:‬

> Hello Mariel,
>
> 1) Have you tried to run “CHECKPOINT;” on the standby to perform
> restartpoint explicitly? It is possible.
>
> 2) If we talk about streaming replication, do you use replication slots?
> If so, have you checked pg_replication_slots and pg_stat_replication? They
> can help to troubleshoot streaming replication delays — see columns
> sent_location, write_location, flush_location, and replay_location in
> pg_stat_replication and restart_lsn in pg_replication_slots. If you have
> delay in replaying, it should be seen there.
>
> On Wed, May 29, 2019 at 11:39 Mariel Cherkassky <
> mariel(dot)cherkassky(at)gmail(dot)com> wrote:
>
>> Is there any messages that indicates that the secondary replayed a
>> specific wal ? "restored 00000..." means that the restore_command succeeded
>> but there isnt any proof that it replayed the wal.
>>
>> My theory regarding the issue :
>> It seems, that my customer stopped the db 20 minutes after the clone
>> have finished. During those 20 minutes the secondary didnt get enough wal
>> records (6 wal files) so it didnt reach the max_wal_size. My
>> checkpoint_timeout is set to 30minutes, therefore there wasnt any
>> checkpoint. As a result of that the secondary didnt reach a restart point.
>> Does that sounds reasonable ?
>>
>> So basically, if I clone a small primary db, the secondary would reach a
>> restart point only if it reached a checkpoint (checkpoint_timeout or
>> max_wal_size). However, I have cloned many small dbs and saw the it takes a
>> sec to start the secondary (which means that restartpoint was reached). So
>> what am I missing ?
>>
>> ‫בתאריך יום ד׳, 29 במאי 2019 ב-11:20 מאת ‪Fabio Pardi‬‏ <‪
>> f(dot)pardi(at)portavita(dot)eu‬‏>:‬
>>
>>>
>>>
>>> On 5/29/19 9:20 AM, Mariel Cherkassky wrote:
>>> > First of all thanks Fabio.
>>> > I think that I'm missing something :
>>> > In the next questionI'm not talking about streaming replication,rather
>>> > on recovery :
>>> >
>>> > 1.When the secondary get the wals from the primary it tries to replay
>>> > them correct ?
>>>
>>>
>>> correct
>>>
>>> >
>>> > 2. By replaying it just go over the wal records and run them in the
>>> > secondary ?
>>> >
>>>
>>> correct
>>>
>>> > 3.All those changes are saved in the shared_buffer(secondary) or the
>>> > changed are immediately done on the data files blocks ?
>>> >
>>>
>>> the changes are not saved to your datafile yet. That happens at
>>> checkpoint time.
>>>
>>> > 4.The secondary will need a checkpoint in order to flush those changes
>>> > to the data files and in order to reach a restart point ?
>>> >
>>>
>>> yes
>>>
>>> > So, basically If I had a checkpoint during the clone, the secondary
>>> > should also have a checkpoint when I recover the secondary right ?
>>> >
>>>
>>> correct. Even after being in sync with master, if you restart Postgres
>>> on standby, it will then re-apply the WAL files from the last checkpoint.
>>>
>>> In the logfile of the standby, you will see as many messages reporting
>>> "restored log file" as many WAL files were produced since the last
>>> checkpoint
>>>
>>> Hope it helps to clarify.
>>>
>>> regards,
>>>
>>> fabio pardi
>>> >
>>> > ‫בתאריך יום ג׳, 28 במאי 2019 ב-13:54 מאת ‪Fabio Pardi‬‏
>>> > <‪f(dot)pardi(at)portavita(dot)eu <mailto:f(dot)pardi(at)portavita(dot)eu>‬‏>:‬
>>> >
>>> > Hi Mariel,
>>> >
>>> > please keep the list posted. When you reply, use 'reply all'. That
>>> > will maybe help others in the community and you might also get more
>>> > help from others.
>>> >
>>> > answers are in line here below
>>> >
>>> >
>>> >
>>> > On 28/05/2019 10:54, Mariel Cherkassky wrote:
>>> > > I have pg 9.6, repmgr version 4.3 .
>>> > > I see in the logs that wal files are restored :
>>> > > 2019-05-22 12:35:12 EEST 60942 LOG: restored log file
>>> > "000000010000377B000000DB" from archive
>>> > > that means that the restore_command worked right ?
>>> > >
>>> >
>>> > right
>>> >
>>> > > According to the docs :
>>> > > "In standby mode, a restartpoint is also triggered
>>> > if checkpoint_segments log segments have been replayed since last
>>> > restartpoint and at least one checkpoint record has been replayed.
>>> > Restartpoints can't be performed more frequently than checkpoints
>>> in
>>> > the master because restartpoints can only be performed at
>>> checkpoint
>>> > records"
>>> > > so maybe I should decrease max_wal_size or even
>>> checkpoint_timeout
>>> > to force a restartpoint ?
>>> > > During this gap (standby clone) 6-7 wals were generated on the
>>> primary
>>> > >
>>> >
>>> >
>>> > From what you posted earlier, you should in any case have hit a
>>> > checkpoint every 30 minutes. (That was also the assumption in the
>>> > previous messages. If that's not happening, then i would really
>>> > investigate.)
>>> >
>>> > That is, if during your cloning only a few WAL files were
>>> generated,
>>> > then it is not enough to trigger a checkpoint and you fallback to
>>> > the 30 minutes.
>>> >
>>> > I would not be bothered if i were you, but can always force a
>>> > checkpoint on the master issuing:
>>> >
>>> > CHECKPOINT ;
>>> >
>>> > at that stage, on the standby logs you will see the messages:
>>> >
>>> > restartpoint starting: ..
>>> >
>>> > restartpoint complete: ..
>>> >
>>> >
>>> >
>>> > regards,
>>> >
>>> > fabio pardi
>>> >
>>> > >
>>> > > ‫בתאריך יום ב׳, 27 במאי 2019 ב-14:04 מאת ‪Fabio Pardi‬‏
>>> > <‪f(dot)pardi(at)portavita(dot)eu <mailto:f(dot)pardi(at)portavita(dot)eu>
>>> > <mailto:f(dot)pardi(at)portavita(dot)eu <mailto:f(dot)pardi(at)portavita(dot)eu>>‬‏>:‬
>>> > >
>>> > > If you did not even see this messages on your standby logs:
>>> > >
>>> > > restartpoint starting: xlog
>>> > >
>>> > > then it means that the checkpoint was even never started.
>>> > >
>>> > > In that case, I have no clue.
>>> > >
>>> > > Try to describe step by step how to reproduce the problem
>>> > together with
>>> > > your setup and the version number of Postgres and repmgr, and
>>> > i might be
>>> > > able to help you further.
>>> > >
>>> > > regards,
>>> > >
>>> > > fabio pardi
>>> > >
>>> > > On 5/27/19 12:17 PM, Mariel Cherkassky wrote:
>>> > > > standby_mode = 'on'
>>> > > > primary_conninfo = 'host=X.X.X.X user=repmgr
>>> > connect_timeout=10 '
>>> > > > recovery_target_timeline = 'latest'
>>> > > > primary_slot_name = repmgr_slot_1
>>> > > > restore_command = 'rsync -avzhe ssh
>>> > > > postgres(at)x(dot)x(dot)x(dot)x:/var/lib/pgsql/archive/%f
>>> > /var/lib/pgsql/archive/%f ;
>>> > > > gunzip < /var/lib/pgsql/archive/%f > %p'
>>> > > > archive_cleanup_command =
>>> '/usr/pgsql-9.6/bin/pg_archivecleanup
>>> > > > /var/lib/pgsql/archive %r'
>>> > > >
>>> > > > ‫בתאריך יום ב׳, 27 במאי 2019 ב-12:29 מאת ‪Fabio Pardi‬‏
>>> > > > <‪f(dot)pardi(at)portavita(dot)eu <mailto:f(dot)pardi(at)portavita(dot)eu>
>>> > <mailto:f(dot)pardi(at)portavita(dot)eu <mailto:f(dot)pardi(at)portavita(dot)eu>>
>>> > <mailto:f(dot)pardi(at)portavita(dot)eu <mailto:f(dot)pardi(at)portavita(dot)eu>
>>> > <mailto:f(dot)pardi(at)portavita(dot)eu <mailto:f(dot)pardi(at)portavita(dot)eu>>>‬‏>:‬
>>> > > >
>>> > > > Hi Mariel,
>>> > > >
>>> > > > let s keep the list in cc...
>>> > > >
>>> > > > settings look ok.
>>> > > >
>>> > > > what's in the recovery.conf file then?
>>> > > >
>>> > > > regards,
>>> > > >
>>> > > > fabio pardi
>>> > > >
>>> > > > On 5/27/19 11:23 AM, Mariel Cherkassky wrote:
>>> > > > > Hey,
>>> > > > > the configuration is the same as in the primary :
>>> > > > > max_wal_size = 2GB
>>> > > > > min_wal_size = 1GB
>>> > > > > wal_buffers = 16MB
>>> > > > > checkpoint_completion_target = 0.9
>>> > > > > checkpoint_timeout = 30min
>>> > > > >
>>> > > > > Regarding your question, I didnt see this message
>>> > (consistent recovery
>>> > > > > state reached at), I guess thats why the secondary
>>> > isnt avaialble
>>> > > > yet..
>>> > > > >
>>> > > > > Maybe I'm wrong, but what I understood from the
>>> > documentation- restart
>>> > > > > point is generated only after the secondary had a
>>> > checkpoint wihch
>>> > > > means
>>> > > > > only after 30 minutes or after max_wal_size is
>>> reached
>>> > ? But
>>> > > > still, why
>>> > > > > wont the secondary reach a consisteny recovery state
>>> > (does it
>>> > > > requires a
>>> > > > > restart point to be generated ? )
>>> > > > >
>>> > > > >
>>> > > > > ‫בתאריך יום ב׳, 27 במאי 2019 ב-12:12 מאת ‪Fabio
>>> Pardi‬‏
>>> > > > > <‪f(dot)pardi(at)portavita(dot)eu <mailto:f(dot)pardi(at)portavita(dot)eu>
>>> > <mailto:f(dot)pardi(at)portavita(dot)eu <mailto:f(dot)pardi(at)portavita(dot)eu>>
>>> > <mailto:f(dot)pardi(at)portavita(dot)eu <mailto:f(dot)pardi(at)portavita(dot)eu>
>>> > <mailto:f(dot)pardi(at)portavita(dot)eu <mailto:f(dot)pardi(at)portavita(dot)eu>>>
>>> > > > <mailto:f(dot)pardi(at)portavita(dot)eu
>>> > <mailto:f(dot)pardi(at)portavita(dot)eu> <mailto:f(dot)pardi(at)portavita(dot)eu
>>> > <mailto:f(dot)pardi(at)portavita(dot)eu>> <mailto:f(dot)pardi(at)portavita(dot)eu
>>> > <mailto:f(dot)pardi(at)portavita(dot)eu> <mailto:f(dot)pardi(at)portavita(dot)eu
>>> > <mailto:f(dot)pardi(at)portavita(dot)eu>>>>‬‏>:‬
>>> > > > >
>>> > > > > Hi Mariel,
>>> > > > >
>>> > > > > if i m not wrong, on the secondary you will see
>>> > the messages you
>>> > > > > mentioned when a checkpoint happens.
>>> > > > >
>>> > > > > What are checkpoint_timeout and max_wal_size on
>>> > your standby?
>>> > > > >
>>> > > > > Did you ever see this on your standby log?
>>> > > > >
>>> > > > > "consistent recovery state reached at .."
>>> > > > >
>>> > > > >
>>> > > > > Maybe you can post your whole configuration of
>>> > your standby
>>> > > > for easier
>>> > > > > debug.
>>> > > > >
>>> > > > > regards,
>>> > > > >
>>> > > > > fabio pardi
>>> > > > >
>>> > > > >
>>> > > > >
>>> > > > >
>>> > > > > On 5/27/19 10:49 AM, Mariel Cherkassky wrote:
>>> > > > > > Hey,
>>> > > > > > PG 9.6, I have a standalone configured. I tried
>>> > to start up a
>>> > > > > secondary,
>>> > > > > > run standby clone (repmgr). The clone process
>>> > took 3 hours
>>> > > > and during
>>> > > > > > that time wals were generated(mostly because
>>> of the
>>> > > > > checkpoint_timeout).
>>> > > > > > As a result of that, when I start the secondary
>>> > ,I see that the
>>> > > > > > secondary keeps getting the wals but I dont see
>>> > any messages
>>> > > > that
>>> > > > > > indicate that the secondary tried to replay the
>>> > wals.
>>> > > > > > messages that i see :
>>> > > > > > receiving incremental file list
>>> > > > > > 000000010000377B000000DE
>>> > > > > >
>>> > > > > > sent 30 bytes received 4.11M bytes 8.22M
>>> bytes/sec
>>> > > > > > total size is 4.15M speedup is 1.01
>>> > > > > > 2019-05-22 12:48:10 EEST 60942 LOG: restored
>>> > log file
>>> > > > > > "000000010000377B000000DE" from archive
>>> > > > > > 2019-05-22 12:48:11 EEST db63311 FATAL: the
>>> > database system is
>>> > > > > starting up
>>> > > > > > 2019-05-22 12:48:12 EEST db63313 FATAL: the
>>> > database system is
>>> > > > > > starting up
>>> > > > > >
>>> > > > > > I was hoping to see the following messages
>>> > (taken from a
>>> > > > different
>>> > > > > > machine) :
>>> > > > > > 2019-05-27 01:15:37 EDT 7428 LOG:
>>> > restartpoint starting: time
>>> > > > > > 2019-05-27 01:16:18 EDT 7428 LOG:
>>> > restartpoint complete:
>>> > > > wrote 406
>>> > > > > > buffers (0.2%); 1 transaction log file(s)
>>> added,
>>> > 0 removed, 0
>>> > > > > recycled;
>>> > > > > > write=41.390 s, sync=0.001 s, total=41.582 s;
>>> > sync file
>>> > > > > > s=128, longest=0.000 s, average=0.000 s;
>>> > distance=2005 kB,
>>> > > > > estimate=2699 kB
>>> > > > > > 2019-05-27 01:16:18 EDT 7428 LOG: recovery
>>> > restart point at
>>> > > > > 4/D096C4F8
>>> > > > > >
>>> > > > > > My primary settings(wals settings) :
>>> > > > > > wal_buffers = 16MB
>>> > > > > > checkpoint_completion_target = 0.9
>>> > > > > > checkpoint_timeout = 30min
>>> > > > > >
>>> > > > > > Any idea what can explain why the secondary
>>> > doesnt replay
>>> > > > the wals ?
>>> > > > >
>>> > > > >
>>> > > >
>>> > >
>>> >
>>>
>>

In response to

Browse pgsql-performance by date

  From Date Subject
Next Message Haroldo Kerry 2019-05-30 15:08:04 Shortest offline window on database migration
Previous Message Nikolay Samokhvalov 2019-05-29 10:54:03 Re: improve wals replay on secondary