Re: Unexpected trouble from pg_basebackup

From: Magnus Hagander <magnus(at)hagander(dot)net>
To: otheus uibk <otheus(dot)uibk(at)gmail(dot)com>
Cc: "pgsql-general(at)postgresql(dot)org" <pgsql-general(at)postgresql(dot)org>
Subject: Re: Unexpected trouble from pg_basebackup
Date: 2016-10-05 18:34:53
Message-ID: CABUevExY9SDDD_d=o2DxECVA3zihmHjGEcAOrb=3=m4AgERDtw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On Wed, Oct 5, 2016 at 3:55 PM, otheus uibk <otheus(dot)uibk(at)gmail(dot)com> wrote:

>
>
> On Tue, Oct 4, 2016 at 10:49 PM, Magnus Hagander <magnus(at)hagander(dot)net>
> wrote:
>
>>
>>
>> On Tue, Oct 4, 2016 at 10:42 PM, otheus uibk <otheus(dot)uibk(at)gmail(dot)com>
>> wrote:
>>
>>> After a 3 to 4 minute delay, pg_basebackup started doing it's thing and
>>> finished within a few minutes. So now the question is: why the startup
>>> delay?
>>>
>>
>>
>> Sounds to me like it's doing a CHECKPOINT with spreading, which make it
>> take time. Try with "-c fast" and see if the problem goes away.
>>
>
> Maybe not too far off.
>
>
> 2016-10-04 17:43:40.620 GMT 57eb90a0.6e07
> 402 00000 LOG: checkpoint complete: wrote 12799 buffers (1.0%); 0
> transaction log file(s) added, 0 removed, 5 recycled; write=1389.348 s,
> sync=0.033 s, total=1389.400 s; sync files=240, longest=0.003 s,
> average=0.000 s; distance=92915 kB, estimate=129373 kB
> 2016-10-04 18:20:31.714 GMT 57eb90a0.6e07
> 403 00000 LOG: checkpoint starting: time
> 2016-10-04 18:39:19.870 GMT 57eb90a0.6e07
> 404 00000 LOG: checkpoint complete: wrote 10265 buffers (0.8%); 0
> transaction log file(s) added, 0 removed, 4 recycled; write=1128.118 s,
> sync=0.023 s, total=1128.155 s; sync files=190, longest=0.002 s,
> average=0.000 s; distance=73419 kB, estimate=123778 kB
> 2016-10-04 19:07:12.647 GMT [unknown] pgsync 57f3fde0.52e3 2
> 00000 LOG: replication connection authorized: user=pgsync
> 2016-10-04 19:07:12.703 GMT 57eb90a0.6e07
> 405 00000 LOG: checkpoint starting: force wait
> 2016-10-04 19:20:32.879 GMT [unknown] pgsync 57f40100.5891 2
> 00000 LOG: replication connection authorized: user=pgsync
> 2016-10-04 19:23:05.249 GMT 57eb90a0.6e07
> 406 00000 LOG: checkpoint complete: wrote 8638 buffers (0.7%); 0
> transaction log file(s) added, 0 removed, 5 recycled; write=952.514 s,
> sync=0.016 s, total=952.546 s; sync files=238, longest=0.002 s,
> average=0.000 s; distance=68257 kB, estimate=118226 kB
> 2016-10-04 19:23:05.249 GMT 57eb90a0.6e07
> 407 00000 LOG: checkpoint starting: force wait
> 2016-10-04 19:28:52.232 GMT 57eb90a0.6e07
> 408 00000 LOG: checkpoint complete: wrote 3102 buffers (0.2%); 0
> transaction log file(s) added, 0 removed, 0 recycled; write=346.957 s,
> sync=0.018 s, total=346.982 s; sync files=149, longest=0.002 s,
> average=0.000 s; distance=36016 kB, estimate=110005 kB
> 2016-10-04 19:30:31.922 GMT [unknown] pgsync 57f40357.5c70 2
> 00000 LOG: replication connection authorized: user=pgsync
> 2016-10-04 19:30:31.932 GMT 57eb90a0.6e07
> 409 00000 LOG: checkpoint starting: force wait
> 2016-10-04 19:33:40.857 GMT 57eb90a0.6e07
> 410 00000 LOG: checkpoint complete: wrote 1763 buffers (0.1%); 0
> transaction log file(s) added, 0 removed, 1 recycled; write=188.886 s,
> sync=0.030 s, total=188.924 s; sync files=125, longest=0.004 s,
> average=0.000 s; distance=13135 kB, estimate=100318 kB
>
>
>
> OK, so what was happening is that the checkpoints were taking 5 to 15
> minutes minutes, and aborted basebackups were triggering new checkpoints
> which waited on the previous ones.
>
> Is it possible the new environment is not high-performance enough??
>
> possibly relevant configuration change:
> checkpoint_timeout=1h
>
>
>
>
pg_basebackup will always send in a checkpoint. PostgreSQL will time that
to take approximately checkpoint_timeout * checkpoint_completion_target
time. So with checkpoint_timeout set to a very high value, it will by
default target something like 30 minutes before it even gets started. The
only reason it takes as *little* as it does is that your system is lightly
loaded.

You can do a fast checkpoint with the parameter I suggested before, or you
can tune your checkpoint_timeout to be something that gives you a more
reasonable time.

--
Magnus Hagander
Me: http://www.hagander.net/
Work: http://www.redpill-linpro.com/

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Devrim Gündüz 2016-10-05 18:58:47 Re: Installing pgAdmin 4 in Oracle Enterprise Linux 7
Previous Message Darren Lafreniere 2016-10-05 18:30:35 Re: BRIN indexes and ORDER BY