Re: BUG #14714: long running sessions from remote instance seems to hang some times

From: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
To: Josef Machytka <josef(dot)machytka(at)gmail(dot)com>
Cc: "pgsql-bugs(at)postgresql(dot)org" <pgsql-bugs(at)postgresql(dot)org>
Subject: Re: BUG #14714: long running sessions from remote instance seems to hang some times
Date: 2017-06-23 09:37:06
Message-ID: CAFj8pRA_TOScxvsB8XwrtXRVGtL_0mgG-kWO055NkaRZhYBJRg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

2017-06-23 11:26 GMT+02:00 Josef Machytka <josef(dot)machytka(at)gmail(dot)com>:

> Today I got several sessions in this "drowsy" state and it almost caused
> collapse of the whole reporting system because aggregations were not fully
> calculated.
> Looks like now on PG 9.6.3 is this problem even worse then on previous
> 9.6.x subversions...
>
> I checked "top", "iotop -u postgres", "perf top" and "perf top -u
> postgres" and as far as I could see those sessions are running although
> they are marked as "S" sleeping in "top" command almost all the time.
> From time to time they cause very low CPU load, something like from 0,5 to
> 0,8% - which is visible only in "perf" output. In "top" I see almost all
> the time 0.0% on CPU.
> The same with disk IO - iotop shows only a few MB/s from time to time.
> They are marked as "active" in "pg_stat_activity" and no wait events are
> shown.
>
> GCE Instance has Debian 8, 10 CPUs, 46GB RAM
> Our database has 2390 GB is distributed over 3 different tablespaces on 3
> different GCE standard persistent disks, wal logs and temp tablespace have
> separate disk.
>
> Out pg config is:
> shared_buffers = 8GB
> temp_buffers = 512MB
> work_mem = 192MB
> maintenance_work_mem = 2GB
> max_connections = 150
> max_wal_size = 8GB
> min_wal_size = 4GB
> max_worker_processes = 50
> max_parallel_workers_per_gather = 4
>
> There are several procedures which set work_mem to 256MB because tests
> showed it speeds up processing really significantly.
> Reason is when we switched from PG 9.5 to PG 9.6 we found that many old
> queries had suddenly very bad explain plan and very long run time.
> Only setting work_mem from previous 64MB to the current values helped to
> get the same or better results for those queries on 9.6.
> Of course we upgraded instance with more memory. But we do not use swap.
>
> Shall I look for some special perf symbols in output? Or is there a
> possibility to debug it deeper?
>
>
what is perf top output?

you can install debug symbols for postgres and for kernel
https://wiki.debian.org/DebugPackage

Regards

Pavel

Thanks
>
> Josef
>
>
> On 19 June 2017 at 17:56, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com> wrote:
>
>>
>>
>> 2017-06-19 16:49 GMT+02:00 <josef(dot)machytka(at)gmail(dot)com>:
>>
>>> The following bug has been logged on the website:
>>>
>>> Bug reference: 14714
>>> Logged by: Josef Machytka
>>> Email address: josef(dot)machytka(at)gmail(dot)com
>>> PostgreSQL version: 9.6.3
>>> Operating system: Debian jessie
>>> Description:
>>>
>>> On pg 9.6.* we have sometimes strange problem with some sessions. We
>>> experienced it from bash scripts, golang program and node.js
>>> applications.
>>> Environment are Google Compute Engine instances with Debian 8.
>>>
>>> Sometimes some session goes into some kind of "drowsy" state and runs
>>> incredibly slowly. It causes some very low disk IO and runs like 20x
>>> slower
>>> then usual. This happens mainly when database is under heavy load and it
>>> happens randomly to different tasks. It happens randomly to both
>>> connections
>>> from local instance (cronjob running on instance with PG) and remote
>>> connections from other instances.
>>>
>>> When we restart task it runs normally. I cannot see any changes in
>>> process
>>> priority in OS, process switches between sleeping and running state and
>>> I do
>>> not see any locks or waits in pg_stat_activity or pg_locks. Usually all
>>> other processes finish OK and this "drowsy" one runs "for ever..."
>>>
>>> We implemented timeouts for PG tasks into our programs to restart tasks
>>> but
>>> it very annoying and we have to use long enough timeouts anyway so it
>>> delays
>>> tasks.... It happens from time to time also to dblink connections inside
>>> pg
>>> function so we had to rewrite our programs which were running without any
>>> problems on PG 9.5. Problems started when we switched to PG 9.6 and we
>>> experienced them on all minor versions - we currently run 9.6.3 and
>>> problem
>>> is still present...
>>>
>>> Do you please have any advice what to check or what setting might help?
>>>
>>
>> it can be spin lock issue. Try to use "perf top". Install
>> postgresql-server debug symbols before.
>>
>> what is your share buffers configuration?
>>
>> Regards
>>
>> Pavel
>>
>>>
>>>
>>>
>>>
>>> --
>>> Sent via pgsql-bugs mailing list (pgsql-bugs(at)postgresql(dot)org)
>>> To make changes to your subscription:
>>> http://www.postgresql.org/mailpref/pgsql-bugs
>>>
>>
>>
>

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Josef Machytka 2017-06-23 11:37:33 Re: BUG #14714: long running sessions from remote instance seems to hang some times
Previous Message Josef Machytka 2017-06-23 09:26:41 Re: BUG #14714: long running sessions from remote instance seems to hang some times