Re: BUG #14714: long running sessions from remote instance seems to hang some times

From: Josef Machytka <josef(dot)machytka(at)gmail(dot)com>
To: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
Cc: "pgsql-bugs(at)postgresql(dot)org" <pgsql-bugs(at)postgresql(dot)org>
Subject: Re: BUG #14714: long running sessions from remote instance seems to hang some times
Date: 2017-06-23 09:26:41
Message-ID: CAGvVEFuEz2bPFoG-3weuorB6-T-2i_evzEzDzgqnHN7TBCpfUA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

Today I got several sessions in this "drowsy" state and it almost caused
collapse of the whole reporting system because aggregations were not fully
calculated.
Looks like now on PG 9.6.3 is this problem even worse then on previous
9.6.x subversions...

I checked "top", "iotop -u postgres", "perf top" and "perf top -u postgres"
and as far as I could see those sessions are running although they are
marked as "S" sleeping in "top" command almost all the time.
From time to time they cause very low CPU load, something like from 0,5 to
0,8% - which is visible only in "perf" output. In "top" I see almost all
the time 0.0% on CPU.
The same with disk IO - iotop shows only a few MB/s from time to time.
They are marked as "active" in "pg_stat_activity" and no wait events are
shown.

GCE Instance has Debian 8, 10 CPUs, 46GB RAM
Our database has 2390 GB is distributed over 3 different tablespaces on 3
different GCE standard persistent disks, wal logs and temp tablespace have
separate disk.

Out pg config is:
shared_buffers = 8GB
temp_buffers = 512MB
work_mem = 192MB
maintenance_work_mem = 2GB
max_connections = 150
max_wal_size = 8GB
min_wal_size = 4GB
max_worker_processes = 50
max_parallel_workers_per_gather = 4

There are several procedures which set work_mem to 256MB because tests
showed it speeds up processing really significantly.
Reason is when we switched from PG 9.5 to PG 9.6 we found that many old
queries had suddenly very bad explain plan and very long run time.
Only setting work_mem from previous 64MB to the current values helped to
get the same or better results for those queries on 9.6.
Of course we upgraded instance with more memory. But we do not use swap.

Shall I look for some special perf symbols in output? Or is there a
possibility to debug it deeper?

Thanks

Josef

On 19 June 2017 at 17:56, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com> wrote:

>
>
> 2017-06-19 16:49 GMT+02:00 <josef(dot)machytka(at)gmail(dot)com>:
>
>> The following bug has been logged on the website:
>>
>> Bug reference: 14714
>> Logged by: Josef Machytka
>> Email address: josef(dot)machytka(at)gmail(dot)com
>> PostgreSQL version: 9.6.3
>> Operating system: Debian jessie
>> Description:
>>
>> On pg 9.6.* we have sometimes strange problem with some sessions. We
>> experienced it from bash scripts, golang program and node.js applications.
>> Environment are Google Compute Engine instances with Debian 8.
>>
>> Sometimes some session goes into some kind of "drowsy" state and runs
>> incredibly slowly. It causes some very low disk IO and runs like 20x
>> slower
>> then usual. This happens mainly when database is under heavy load and it
>> happens randomly to different tasks. It happens randomly to both
>> connections
>> from local instance (cronjob running on instance with PG) and remote
>> connections from other instances.
>>
>> When we restart task it runs normally. I cannot see any changes in process
>> priority in OS, process switches between sleeping and running state and I
>> do
>> not see any locks or waits in pg_stat_activity or pg_locks. Usually all
>> other processes finish OK and this "drowsy" one runs "for ever..."
>>
>> We implemented timeouts for PG tasks into our programs to restart tasks
>> but
>> it very annoying and we have to use long enough timeouts anyway so it
>> delays
>> tasks.... It happens from time to time also to dblink connections inside
>> pg
>> function so we had to rewrite our programs which were running without any
>> problems on PG 9.5. Problems started when we switched to PG 9.6 and we
>> experienced them on all minor versions - we currently run 9.6.3 and
>> problem
>> is still present...
>>
>> Do you please have any advice what to check or what setting might help?
>>
>
> it can be spin lock issue. Try to use "perf top". Install
> postgresql-server debug symbols before.
>
> what is your share buffers configuration?
>
> Regards
>
> Pavel
>
>>
>>
>>
>>
>> --
>> Sent via pgsql-bugs mailing list (pgsql-bugs(at)postgresql(dot)org)
>> To make changes to your subscription:
>> http://www.postgresql.org/mailpref/pgsql-bugs
>>
>
>

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Pavel Stehule 2017-06-23 09:37:06 Re: BUG #14714: long running sessions from remote instance seems to hang some times
Previous Message shailesh.h.singh 2017-06-23 09:26:22 BUG #14716: Backup failed