Re: ERROR: too many dynamic shared memory segments

From: Julian Backes <julianbackes(at)gmail(dot)com>
To: Nicola Contu <nicola(dot)contu(at)gmail(dot)com>
Cc: Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>, "pgsql-generallists(dot)postgresql(dot)org" <pgsql-general(at)lists(dot)postgresql(dot)org>, Alessandro Aste <Alessandro(dot)aste(at)gtt(dot)net>
Subject: Re: ERROR: too many dynamic shared memory segments
Date: 2020-01-29 10:23:57
Message-ID: CAPv0rXGVENB=pR7ZNc72Ej7Lm35deU728AdooVjMDCmL8BrVzQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Hi,

we only had the "too many shared too many dynamic shared memory segments"
error but no segmentation faults. The error started occurring after
upgrading from postgres 10 to postgres 12 (server has 24 cores / 48
threads, i.e. many parallel workers). The error itself was not that much of
a problem but /dev/shm started filling up with orphaned files which
probably (?) had not been cleaned up by postgres after the parallel workers
died. In consequence, after some time, /dev/shm was full and everything
crashed.

Unfortunately, the only "solution" we found so far was to increase max
connections from 100 to 1000. After that (about 2 months ago I think), the
error had gone.

Maybe this helps...

Julian

Am Mi., 29. Jan. 2020 um 10:37 Uhr schrieb Nicola Contu <
nicola(dot)contu(at)gmail(dot)com>:

> This is the error on postgres log of the segmentation fault :
>
> 2020-01-21 14:20:29 GMT [] [42222]: [108-1] db=,user= LOG: server process
> (PID 2042) was terminated by signal 11: Segmentation fault
> 2020-01-21 14:20:29 GMT [] [42222]: [109-1] db=,user= DETAIL: Failed
> process was running: select pid from pg_stat_activity where query ilike
> 'REFRESH MATERIALIZED VIEW CONCURRENTLY matview_vrs_request_stats'
> 2020-01-21 14:20:29 GMT [] [42222]: [110-1] db=,user= LOG: terminating
> any other active server processes
> 2020-01-21 14:20:29 GMT [127.0.0.1(34256)] [2724]: [4-1]
> db=cmdv3,user=admin WARNING: terminating connection because of crash of
> another server process
> 2020-01-21 14:20:29 GMT [127.0.0.1(34256)] [2724]: [5-1]
> db=cmdv3,user=admin DETAIL: The postmaster has commanded this server
> process to roll back the current transaction and exit, because another
> server process exited abnormally and possibly corrupted shared memory.
> 2020-01-21 14:20:29 GMT [127.0.0.1(34256)] [2724]: [6-1]
> db=cmdv3,user=admin HINT: In a moment you should be able to reconnect to
> the database and repeat your command.
> 2020-01-21 14:20:29 GMT [127.0.0.1(34026)] [2055]: [5-1]
> db=cmdv3,user=admin WARNING: terminating connection because of crash of
> another server process
>
> At CentOS level :
> Jan 21 14:20:29 usnyh-cmd1.gt-t.net kernel: postmaster[2042]: segfault at
> 0 ip 000000000048bef4 sp 00007ffdf4955bb0 error 4 in postgres[400000+6c5000]
> Jan 21 14:20:29 usnyh-cmd1.gt-t.net pgbouncer: 2020-01-21 14:20:29.905
> GMT [33727] WARNING C-0x21526e8: cmdv3/admin(at)10(dot)151(dot)2(dot)154:39688 pooler
> error: server conn crashed?
> Jan 21 14:20:29 usnyh-cmd1.gt-t.net pgbouncer: 2020-01-21 14:20:29.905
> GMT [33727] WARNING S-0x1f5ff58: cmdv3/admin(at)127(dot)0(dot)0(dot)1:5432 got packet
> 'N' from server when not linked
> Jan 21 14:20:29 usnyh-cmd1.gt-t.net pgbouncer: 2020-01-21 14:20:29.905
> GMT [33727] WARNING S-0x25097f0: cmdv3/admin(at)127(dot)0(dot)0(dot)1:5432 got packet
> 'N' from server when not linked
> Jan 21 14:20:29 usnyh-cmd1.gt-t.net pgbouncer: 2020-01-21 14:20:29.906
> GMT [33727] WARNING S-0x2508b60: cmdv3/admin(at)127(dot)0(dot)0(dot)1:5432 got packet
> 'N' from server when not linked
> Jan 21 14:20:29 usnyh-cmd1.gt-t.net pgbouncer: 2020-01-21 14:20:29.981
> GMT [33727] ERROR S: login failed: FATAL: the database system is in
> recovery mode
>
>
> > If you're on Linux, you can probably see them with "ls /dev/shm".
>
> I see a lot of files there, and doing a cat they are empty. What can I do
> with them?
>
> Those are two different problems I guess, but they are related because
> right before the Segmentation Fault I see a lot of shared segment errors in
> the postgres log.
>
> Il giorno mer 29 gen 2020 alle ore 10:09 Thomas Munro <
> thomas(dot)munro(at)gmail(dot)com> ha scritto:
>
>> On Wed, Jan 22, 2020 at 4:06 AM Nicola Contu <nicola(dot)contu(at)gmail(dot)com>
>> wrote:
>> > after a few months, we started having this issue again.
>> > So we revert the work_mem parameter to 600MB instead of 2GB.
>> > But the issue is still there. A query went to segmentation fault, the
>> DB went to recovery mode and our app went to read only for a few minutes.
>>
>> Hi Nicola,
>> Hmm, a segmentation fault sounds like a different problem. Can you
>> please share the exact error messages from PostgreSQL and OS logs?
>>
>> > I understand we can increase max_connections so we can have many more
>> segments.
>> >
>> > My question is : is there a way to understand the number of segments we
>> reached?
>>
>> If you're on Linux, you can probably see them with "ls /dev/shm".
>>
>> > Currently we have 220 max_conn so as your formula is 64 + 2* 220 we
>> have about 500 shared segments.
>> > We would like to increase that number to 300 or 400 but would be great
>> to understand if there is a way to make sure we will solve the issue as it
>> requires a restart of the service.
>> >
>> > I know you were also talking about a redesign this part in PostgreSQL.
>> Do you know if anything has changed in any of the newer versions after 11.5?
>>
>> It's possible that we should increase a couple of constants used the
>> formula -- I'll look into that again. But first I'd like to see if
>> we're even investigating the right problem here.
>>
>

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Thomas Munro 2020-01-29 10:53:24 Re: ERROR: too many dynamic shared memory segments
Previous Message Nicola Contu 2020-01-29 09:37:09 Re: ERROR: too many dynamic shared memory segments