Re: BUG #14714: long running sessions from remote instance seems to hang some times

From: Josef Machytka <josef(dot)machytka(at)gmail(dot)com>
To: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
Cc: "pgsql-bugs(at)postgresql(dot)org" <pgsql-bugs(at)postgresql(dot)org>
Subject: Re: BUG #14714: long running sessions from remote instance seems to hang some times
Date: 2017-06-23 11:37:33
Message-ID: CAGvVEFuZPMeY+zY7mv3kuQbYi3Y2msnom-tXaGZ_aT3gqqruAw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

sorry, here is file with sample - basically several group by aggregations
over different partitioned tables have been locked in this "drowsy" state

On 23 June 2017 at 11:37, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com> wrote:

>
>
> 2017-06-23 11:26 GMT+02:00 Josef Machytka <josef(dot)machytka(at)gmail(dot)com>:
>
>> Today I got several sessions in this "drowsy" state and it almost caused
>> collapse of the whole reporting system because aggregations were not fully
>> calculated.
>> Looks like now on PG 9.6.3 is this problem even worse then on previous
>> 9.6.x subversions...
>>
>> I checked "top", "iotop -u postgres", "perf top" and "perf top -u
>> postgres" and as far as I could see those sessions are running although
>> they are marked as "S" sleeping in "top" command almost all the time.
>> From time to time they cause very low CPU load, something like from 0,5
>> to 0,8% - which is visible only in "perf" output. In "top" I see almost all
>> the time 0.0% on CPU.
>> The same with disk IO - iotop shows only a few MB/s from time to time.
>> They are marked as "active" in "pg_stat_activity" and no wait events are
>> shown.
>>
>> GCE Instance has Debian 8, 10 CPUs, 46GB RAM
>> Our database has 2390 GB is distributed over 3 different tablespaces on 3
>> different GCE standard persistent disks, wal logs and temp tablespace have
>> separate disk.
>>
>> Out pg config is:
>> shared_buffers = 8GB
>> temp_buffers = 512MB
>> work_mem = 192MB
>> maintenance_work_mem = 2GB
>> max_connections = 150
>> max_wal_size = 8GB
>> min_wal_size = 4GB
>> max_worker_processes = 50
>> max_parallel_workers_per_gather = 4
>>
>> There are several procedures which set work_mem to 256MB because tests
>> showed it speeds up processing really significantly.
>> Reason is when we switched from PG 9.5 to PG 9.6 we found that many old
>> queries had suddenly very bad explain plan and very long run time.
>> Only setting work_mem from previous 64MB to the current values helped to
>> get the same or better results for those queries on 9.6.
>> Of course we upgraded instance with more memory. But we do not use swap.
>>
>> Shall I look for some special perf symbols in output? Or is there a
>> possibility to debug it deeper?
>>
>>
> what is perf top output?
>
> you can install debug symbols for postgres and for kernel
> https://wiki.debian.org/DebugPackage
>
> Regards
>
> Pavel
>
> Thanks
>>
>> Josef
>>
>>
>> On 19 June 2017 at 17:56, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com> wrote:
>>
>>>
>>>
>>> 2017-06-19 16:49 GMT+02:00 <josef(dot)machytka(at)gmail(dot)com>:
>>>
>>>> The following bug has been logged on the website:
>>>>
>>>> Bug reference: 14714
>>>> Logged by: Josef Machytka
>>>> Email address: josef(dot)machytka(at)gmail(dot)com
>>>> PostgreSQL version: 9.6.3
>>>> Operating system: Debian jessie
>>>> Description:
>>>>
>>>> On pg 9.6.* we have sometimes strange problem with some sessions. We
>>>> experienced it from bash scripts, golang program and node.js
>>>> applications.
>>>> Environment are Google Compute Engine instances with Debian 8.
>>>>
>>>> Sometimes some session goes into some kind of "drowsy" state and runs
>>>> incredibly slowly. It causes some very low disk IO and runs like 20x
>>>> slower
>>>> then usual. This happens mainly when database is under heavy load and it
>>>> happens randomly to different tasks. It happens randomly to both
>>>> connections
>>>> from local instance (cronjob running on instance with PG) and remote
>>>> connections from other instances.
>>>>
>>>> When we restart task it runs normally. I cannot see any changes in
>>>> process
>>>> priority in OS, process switches between sleeping and running state and
>>>> I do
>>>> not see any locks or waits in pg_stat_activity or pg_locks. Usually all
>>>> other processes finish OK and this "drowsy" one runs "for ever..."
>>>>
>>>> We implemented timeouts for PG tasks into our programs to restart tasks
>>>> but
>>>> it very annoying and we have to use long enough timeouts anyway so it
>>>> delays
>>>> tasks.... It happens from time to time also to dblink connections
>>>> inside pg
>>>> function so we had to rewrite our programs which were running without
>>>> any
>>>> problems on PG 9.5. Problems started when we switched to PG 9.6 and we
>>>> experienced them on all minor versions - we currently run 9.6.3 and
>>>> problem
>>>> is still present...
>>>>
>>>> Do you please have any advice what to check or what setting might help?
>>>>
>>>
>>> it can be spin lock issue. Try to use "perf top". Install
>>> postgresql-server debug symbols before.
>>>
>>> what is your share buffers configuration?
>>>
>>> Regards
>>>
>>> Pavel
>>>
>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> Sent via pgsql-bugs mailing list (pgsql-bugs(at)postgresql(dot)org)
>>>> To make changes to your subscription:
>>>> http://www.postgresql.org/mailpref/pgsql-bugs
>>>>
>>>
>>>
>>
>

Attachment Content-Type Size
perfoutput.txt text/plain 8.1 KB

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Tom Lane 2017-06-23 14:14:23 Re: BUG #14714: long running sessions from remote instance seems to hang some times
Previous Message Pavel Stehule 2017-06-23 09:37:06 Re: BUG #14714: long running sessions from remote instance seems to hang some times