From: | Fujii Masao <masao(dot)fujii(at)gmail(dot)com> |
---|---|
To: | Lonni J Friedman <netllama(at)gmail(dot)com> |
Cc: | Craig Ringer <ringerc(at)ringerc(dot)id(dot)au>, Jerry Sievers <gsievers19(at)comcast(dot)net>, Magnus Hagander <magnus(at)hagander(dot)net>, pgsql-admin(at)postgresql(dot)org |
Subject: | Re: pg_basebackup blocking all queries with horrible performance |
Date: | 2012-06-12 17:49:11 |
Message-ID: | CAHGQGwHEz+B9SpxBNwRyr83YYJ-SugZ1tQFZY2g9n29x4a_Crw@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-admin pgsql-hackers |
On Tue, Jun 12, 2012 at 2:37 AM, Lonni J Friedman <netllama(at)gmail(dot)com> wrote:
> On Fri, Jun 8, 2012 at 7:29 PM, Fujii Masao <masao(dot)fujii(at)gmail(dot)com> wrote:
>> On Sat, Jun 9, 2012 at 4:30 AM, Lonni J Friedman <netllama(at)gmail(dot)com> wrote:
>>> On Thu, Jun 7, 2012 at 11:04 PM, Craig Ringer <ringerc(at)ringerc(dot)id(dot)au> wrote:
>>>> On 06/08/2012 09:01 AM, Lonni J Friedman wrote:
>>>>>
>>>>> On Thu, Jun 7, 2012 at 5:07 PM, Jerry Sievers<gsievers19(at)comcast(dot)net>
>>>>> wrote:
>>>>>>
>>>>>> You might try stopping pg_basebackup in place with SIGSTOP and check
>>>>>>
>>>>>> if problem goes away. SIGCONT and you should start having
>>>>>> sluggishness again.
>>>>>>
>>>>>> If verified, then any sort of throttling mechanism should work.
>>>>>
>>>>>
>>>>> I'm certain that the problem is triggered only when pg_basebackup is
>>>>> running. Its very predictable, and goes away as soon as pg_basebackup
>>>>> finishes running. What do you mean by a throttling mechanism?
>>>>
>>>>
>>>> Sure, it only happens when pg_basebackup is running. But if you *pause*
>>>> pg_basebackup, so it's still running but not currently doing work, does the
>>>> problem go away? Does it come back when you unpause pg_basebackup? That's
>>>> what Jerry was telling you to try.
>>>>
>>>> If the problem goes away when you pause pg_basebackup and comes back when
>>>> you unpause it, it's probably a system load problem.
>>>>
>>>> If it doesn't go away, it's more likely to be a locking issue or something
>>>> _other_ than simple load.
>>>>
>>>> SIGSTOP ("kill -STOP") pauses a process, and SIGCONT ("kill -CONT") resumes
>>>> it, so on Linux you can use these to try and find out. When you SIGSTOP
>>>> pg_basebackup then the postgres backend associated with it should block
>>>> shortly afterwards as its buffers fill up and it can't send more data, so
>>>> the load should come off the server.
>>>>
>>>> A "throttling mechanism" refers to anything that limits the rate or speed of
>>>> a thing. In this case, what you want to do if your problem is system
>>>> overload is to limit the speed at which pg_basebackup does its work so other
>>>> things can still get work done. In other words you want to throttle it.
>>>> Typical throttling mechanisms include the "ionice" and "renice" commands to
>>>> change I/O and CPU priority, respectively.
>>>>
>>>> Note that you may need to change the priority of the *backend* that
>>>> pg_basebackup is using, not necessarily the pg_basebackup command its self.
>>>> I haven't done enough with Pg's replication to know how that works, so
>>>> someone else will have to fill that bit in.
>>>
>>> Thanks for your reply. I've confirmed that issuing a SIGSTOP does
>>> eliminate the thrashing, and issuing a SIGCONT resumes the thrash.
>>>
>>> I've looked at iostat output both before & during pg_basebackup runs,
>>> and I'm not seeing any indication that the problem is due to disk IO
>>> bottlenecks. The numbers don't vary very much at all between the good
>>> & bad times. This is typical when pg_basebackup is running:
>>> ########
>>> Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s
>>> avgrq-sz avgqu-sz await r_await w_await svctm %util
>>> md0
>>> 0.00 0.00 67.76 68.62 4.42 1.46
>>> 88.34 0.00 0.00 0.00 0.00 0.00 0.00
>>> ########
>>>
>>> and this is when the system is ok:
>>> ########
>>> Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s
>>> avgrq-sz avgqu-sz await r_await w_await svctm %util
>>> md0
>>> 0.00 0.00 68.04 68.56 4.44 1.46
>>> 88.39 0.00 0.00 0.00 0.00 0.00 0.00
>>> ########
>>>
>>>
>>> I looked at vmstat output, but nothing is jumping out at me as being
>>> dramatically different when pg_basebackup is running. swap in and
>>> swap out are zero 100% of the time for the good & bad perf cases. I
>>> can post example output if someone is interested, or if there's
>>> something specific that I should be looking at as a potential problem,
>>> let me know.
>>
>> Did you set synchronous_standby_names to '*'? If so, the problem you
>> encountered can happen.
>>
>> When synchronous_standby_names is '*', you cannot control which
>> standbys take a role of synchronous standby. The standby which you
>> expect to run as asynchronous one might be synchronous one. So
>> my guess is that at first one of your three standbys was running as
>> synchronous standby, and all queries were executed normally. But
>> when you started pg_basebackup, pg_basebackup unexpectedly
>> got the role of synchronous standby from another standby. Since
>> pg_basebackup doesn't send the information about replication
>> progress back to the master, all queries (more precisely, transaction
>> commit) got stuck, and kept waiting for the reply from synchronous
>> standby.
>>
>> You can avoid this problem by setting synchronous_standby_names
>> to the names of your standbys instead of '*'.
>
> I don't have synchronous_standby_names set at all. I'm only doing
> asynchronous replication.
Hmm... I have no idea about what happened on your environment, for now.
Could you show me the self-contained test case?
Regards,
--
Fujii Masao
From | Date | Subject | |
---|---|---|---|
Next Message | Amador Alvarez | 2012-06-12 17:59:44 | Hot backup for postgres 8.4 |
Previous Message | Tom Lane | 2012-06-12 16:50:55 | Re: Why auto vacuum almost running all the time on one toast table? |
From | Date | Subject | |
---|---|---|---|
Next Message | Noah Misch | 2012-06-12 17:50:48 | Re: [COMMITTERS] pgsql: Run pgindent on 9.2 source tree in preparation for first 9.3 |
Previous Message | Fujii Masao | 2012-06-12 17:33:54 | Re: pg_basebackup --xlog compatibility break |