Re: [GENERAL] openvz and shared memory trouble

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Adrian Klaver <adrian(dot)klaver(at)aklaver(dot)com>
Cc: Willy-Bas Loos <willybas(at)gmail(dot)com>, lst_hoe02(at)kwsoft(dot)de, pgsql-admin <pgsql-admin(at)postgresql(dot)org>, "pgsql-general(at)postgresql(dot)org" <pgsql-general(at)postgresql(dot)org>
Subject: Re: [GENERAL] openvz and shared memory trouble
Date: 2014-03-31 15:01:02
Message-ID: 15934.1396278062@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-admin pgsql-general

Adrian Klaver <adrian(dot)klaver(at)aklaver(dot)com> writes:
> On 03/31/2014 04:12 AM, Willy-Bas Loos wrote:
>> I'm still worried that it's like Tom Lane said in another discussion:"So
>> basically, you've got a broken kernel here: it claimed to give PG circa
>> (135MB) of memory, but what's actually there is only about (128MB). I
>> don't see any connection between those numbers and the shmmax/shmall
>> settings, either --- so I think this must be some busted implementation
>> of a VM-level limitation."
>> (here:
>> http://www.postgresql.org/message-id/CAK3UJREBcyVBtr8D7vMfU=uDdkjXkrPnGcuy8erYB0tMfKe1LA@mail.gmail.com)
>>
>> And it makes me wonder what else may be issues that arise from that. But
>> especially, what i can do about it.

FWIW, I went back and re-read that message while perusing this thread,
and this time it struck me that there was a significant bit of evidence
I'd overlooked: namely, that the buffer block array is by no means the
last thing in Postgres' shared memory segment. There are a bunch of
other shared data structures allocated after it, some of which almost
certainly had to have been touched by the startup subprocess. The gdb
output makes it clear that the kernel stopped providing memory at
0xb6c4b000; but either it resumed doing so further on, or the whole shared
memory segment *had* been provisioned originally, and then part of it
got unmapped again while the startup process was running.

So it's still clearly a kernel bug, but it seems less likely that it is
triggered by some static limit on shared memory size. Perhaps instead,
the kernel had been filling in pages for the shared segment on-demand,
and then when it got to some limit it refused to do so anymore and allowed
a SIGBUS to happen instead.

> I do not use openvz so I do not have a test bed to try out, but this
> page seems to be related to your problem:
> http://openvz.org/Resource_shortage
> or if you want more detail and a link to what looks to a replacement for
> beancounters:
> http://openvz.org/Setting_UBC_parameters

If this software's idea of resource management is to allow SIGBUS to
happen upon attempting to use memory that had been successfully granted,
then it's a piece of junk that you should get rid of ASAP. (No, I
don't like Linux's OOM-kill solution to resource overcommit either.)

regards, tom lane

In response to

Responses

Browse pgsql-admin by date

  From Date Subject
Next Message Adrian Klaver 2014-03-31 15:14:36 Re: [ADMIN]openvz and shared memory trouble
Previous Message Adrian Klaver 2014-03-31 14:55:57 Re: [ADMIN]openvz and shared memory trouble

Browse pgsql-general by date

  From Date Subject
Next Message Rob Sargent 2014-03-31 15:08:55 char array overhead
Previous Message Adrian Klaver 2014-03-31 14:55:57 Re: [ADMIN]openvz and shared memory trouble