Re: Problems with PG 9.3

From: Dhruv Shukla <dhruvshukla82(at)gmail(dot)com>
To: Kevin Grittner <kgrittn(at)ymail(dot)com>
Cc: Albe Laurenz <laurenz(dot)albe(at)wien(dot)gv(dot)at>, "jayknowsunix(at)gmail(dot)com" <jayknowsunix(at)gmail(dot)com>, "pgsql-admin(at)postgresql(dot)org" <pgsql-admin(at)postgresql(dot)org>
Subject: Re: Problems with PG 9.3
Date: 2014-08-26 15:27:27
Message-ID: CAFiWeJBbbCgHQme6G3LYXQM-n6kkPFxULSQ6zbEZqQcECpFKXg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-admin

Hello All,
We changed the settings to

max_connections=200
work_mem=1Gb
temp_mem=1Gb
shared_buffers=70GB
maintenance_work_mem = 2GB
effective_cache_size=180GB
max_locks_per_transaction=128
autovacuum_max_workers = 12
checkpoint_segments = 256

And ran an strace on one of the programs that was getting stuck and dying
by itself. Here is the log of strace (Ran from another machine)

poll([{fd=3, events=POLLIN|POLLERR}], 1, -1) = 1 ([{fd=3, revents=POLLIN}])
recvfrom(3, "N\0\0\0LSINFO\0C00000\0M -"..., 16384, 0, NULL,
NULL) = 77
write(2, "INFO: ---> did 2\n", 30INFO: ---> did 2
) = 30
poll([{fd=3, events=POLLIN|POLLERR}], 1, -1) = 1 ([{fd=3, revents=POLLIN}])
recvfrom(3, "N\0\0\0MSINFO\0C00000\0M -"..., 16384, 0, NULL,
NULL) = 78
write(2, "INFO: ---> did 14\n", 31INFO: ---> did
14
) = 31
poll([{fd=3, events=POLLIN|POLLERR}], 1, -1 <unfinished ...>

On viewing server logs on postgres server found the error as
FATAL: connection to client lost

There has been some improvements, like few of the getting stuck DB
functions ran good. But the main ones are still failing.

-Dhruv

On Mon, Aug 25, 2014 at 1:48 PM, Dhruv Shukla <dhruvshukla82(at)gmail(dot)com>
wrote:

> Kevin,
> This has been valuable information. Thanks a lot for it.
>
> I was able to reduce the max connections down to 200, but still the other
> parameters I wanted to run over with you before I go ahead and make changes.
>
> Now here comes the dilemma for me, we run memory intensive processes in
> morning using easily resident memory of 4-10G of memory for processing. I
> thought of using work_mem =1Gb so that should commit to around 200GB of
> RAM. and then there are other processes which uses temporary tables in
> which around 200K records are stored in temporary tables
>
> A temp_mem setting of 1GB sounds good, but i am slightly worried if it
> could delay the process for those memory extensive processes.
>
> Planning on getting shared buffers down to 60GB from 80GB.
>
> OS:
> vm.dirty_background_ratio = 10
>
> In regards to vm.dirty_background_ratio, what number do you think will be
> reasonable for such kind of scenarios?
>
> Thanks,
> Dhruv
>
>
> On Mon, Aug 25, 2014 at 12:53 PM, Kevin Grittner <kgrittn(at)ymail(dot)com>
> wrote:
>
>> Dhruv Shukla <dhruvshukla82(at)gmail(dot)com> wrote:
>> > Kevin Grittner <kgrittn(at)ymail(dot)com> wrote:
>> >> Dhruv Shukla <dhruvshukla82(at)gmail(dot)com> wrote:
>>
>> >>> Other informational details about configurations are:
>> >>> shared_buffers = 80GB
>> >>> temp_buffers = 2GB
>> >>> work_mem = 2GB
>> >>> maintenance_work_mem = 16GB
>>
>> >> How much RAM is on the machine (or VM)?
>>
>> > Currently we have a max connection setting for 1000 connections.
>>
>> The temp_buffers setting is a limit *per connection*, and once RAM
>> for a temporary table is allocated to a connection it is never
>> released; so even when all connections are idle they could be
>> setting aside 2TG of RAM for possible use for caching temporary
>> tables.
>>
>> The work_mem setting is an attempt to limit RAM per node of an
>> executing query; one connection can create many allocations of the
>> size set for work_mem. Since not all queries have nodes that
>> require such allocations, and not all connections are necessarily
>> active at the same time, a general rule of thumb is to allow for
>> one work_mem allocation per connection allowed by max_connections.
>> So these settings can easily result in another 2TB of allocations,
>> beyond the temp_buffers mentioned above.
>>
>> A high shared_buffers setting can result in "write gluts" at the OS
>> level when a high percentage of that memory becomes dirty and is
>> dumped to the OS in a short time. This can result in the OS
>> appearing to "lock up" for as long as it takes it to flush all of
>> the dirty data to disk. I don't know what your system is capable
>> of, but most database servers I've seen fall between 20MB/second
>> and 200MB/second. That would correspond to apparent stalls lasting
>> between 6.8 minutes and 1.1 hours. By tweaking the bgwriter
>> settings and the OS dirty settings you can drastically reduce this,
>> but I don't think you've indicated having done that, so 80GB can be
>> expected to cause apparent lockups of those durations.
>>
>> A high shared_buffers setting makes you more vulnerable to long
>> stalls because of transparent huge page operations of the OS.
>>
>> > And RAM on server is 384GB RAM.
>>
>> And overcommitting RAM by a factor of more than 10x is not going to
>> be pretty in any event.
>>
>> If I could not reduce max_connections, I would set work_mem to no
>> more than 100MB and temp_buffers to no more than 50MB. I would
>> drop maintenance_work_mem to 2GB. I would probably drastically
>> reduce shared_buffers and would most certainly make autovacuum and
>> bgwriter more aggressive than the default.
>>
>> If you make those changes and still see a problem, only then is it
>> worth looking at other possible causes.
>>
>> --
>> Kevin Grittner
>> EDB: http://www.enterprisedb.com
>> The Enterprise PostgreSQL Company
>>
>
>
>
> --
> Regards
> Dhruv
> 404-551-2578
>

--
Regards
Dhruv
404-551-2578

In response to

Responses

Browse pgsql-admin by date

  From Date Subject
Next Message Scott Marlowe 2014-08-26 15:57:15 Re: Problems with PG 9.3
Previous Message Craig James 2014-08-26 13:58:18 Corrupt records in WAL archive