Re: ERROR: too many dynamic shared memory segments

From: Nicola Contu <nicola(dot)contu(at)gmail(dot)com>
To: Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
Cc: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>, "pgsql-generallists(dot)postgresql(dot)org" <pgsql-general(at)lists(dot)postgresql(dot)org>, Alessandro Aste <Alessandro(dot)aste(at)gtt(dot)net>
Subject: Re: ERROR: too many dynamic shared memory segments
Date: 2020-01-21 15:06:05
Message-ID: CAMTZZh2UL=+9VBD6SiADgN9r3HzNCZhpoPZbTFK2-1xhaXnMDw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Hey Thomas,
after a few months, we started having this issue again.
So we revert the work_mem parameter to 600MB instead of 2GB.
But the issue is still there. A query went to segmentation fault, the DB
went to recovery mode and our app went to read only for a few minutes.

I understand we can increase max_connections so we can have many more
segments.

My question is : is there a way to understand the number of segments we
reached?
Currently we have 220 max_conn so as your formula is 64 + 2* 220 we have
about 500 shared segments.
We would like to increase that number to 300 or 400 but would be great to
understand if there is a way to make sure we will solve the issue as it
requires a restart of the service.

I know you were also talking about a redesign this part in PostgreSQL. Do
you know if anything has changed in any of the newer versions after 11.5?

Thanks a lot,
Nicola

Il giorno gio 12 set 2019 alle ore 01:01 Thomas Munro <
thomas(dot)munro(at)gmail(dot)com> ha scritto:

> On Wed, Sep 11, 2019 at 11:20 PM Nicola Contu <nicola(dot)contu(at)gmail(dot)com>
> wrote:
> > If the error persist I will try to revert the work_mem.
> > Thanks a lot
>
> Hi Nicola,
>
> It's hard to say exactly what the cause of the problem is in your case
> and how to avoid it, without knowing what your query plans look like.
> PostgreSQL allows 64 + 2 * max_connections segments to exist a time,
> and it needs a number of them that depends on work_mem (in the case of
> Parallel Hash Join and Parallel Bitmap Index Scan), and also depends
> on the number of Gather nodes that appear in the plan, which in some
> unusual cases can result from partitioning.
>
> I've seen people reaching this error by running a lot of parallel
> queries concurrently. If that's the cause, then you can definitely
> get some relief by turning work_mem down, or by turning
> max_connections up (even though you don't want to allow more
> connections -- because it influences the formula for deciding on the
> DSM segment limit). We should probably adjust some of the internal
> constants to give us more slots, to avoid that problem, as discussed
> here:
>
>
> https://www.postgresql.org/message-id/flat/CA%2BhUKGL6H2BpGbiF7Lj6QiTjTGyTLW_vLR%3DSn2tEBeTcYXiMKw%40mail.gmail.com
>
> I've also seen people reaching this error by somehow coming up with
> plans that had a very large number of Gather nodes in them,
> corresponding to partitions; that's probably a bad plan (it'd
> presumably be better to terminate parallelism higher up in the plan,
> but these plans do seem to exist in the wild; I don't recall exactly
> why). I think we need a bit of a redesign so that if there are
> multiple Gather nodes, they share the same main DSM segment, instead
> of blowing through this limit.
>
> --
> Thomas Munro
> https://enterprisedb.com
>

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message David Steele 2020-01-21 15:45:08 Re: pgbackrest: ERROR: [029]: unable to convert base 10 string '0000000B' to unsigned int
Previous Message emilu 2020-01-21 14:17:00 About deadlock for drop index command on big size table