Re: Cannot allocate memory

From: "JOIGNY Michael (at)Neteven" <mjoigny(at)neteven(dot)com>
To: Hannah Huang <hannah(dot)huang(dot)y(at)gmail(dot)com>
Cc: pgsql-admin(at)lists(dot)postgresql(dot)org
Subject: Re: Cannot allocate memory
Date: 2020-09-24 10:18:26
Message-ID: e47eec7f-9dbc-cbe3-dd01-32cbee90f332@neteven.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-admin

Hi Hannah,

For the read load, this comes from the fact that during the migration, a
vacuum was made (backup / restore).

Execution times are basically the same, no major difference.

We don't see any major degradation in performance but rather random
memory errors when we get to around ~ 220GB / 377GB memory usage on our
server. On average we are at 166GB used.

Here is an example below, memory screenshot and postgres logs at the
same time :

memory

We see the first errors of 11:08 a.m. once the first ram peak then
around 11:44 a.m. and 11:51 a.m.

2020-09-24 11:08:16.087 CEST [54111]: [2-1] user=,db=,app=,client=
DETAIL:  Failed on request of size 2048 in memory context
"CacheMemoryContext".
2020-09-24 11:08:16.088 CEST [54109]: [2-1] user=,db=,app=,client=
DETAIL:  Failed on request of size 85408 in memory context
"TopMemoryContext".
2020-09-24 11:08:16.102 CEST [206338]: [245-1]
user=uu,db=mydb,app=[unknown],client=localhost DETAIL:  Failed on
request of size 40 in memory context "MessageContext".
2020-09-24 11:08:52.331 CEST [161313]: [62195-1]
user=uu,db=mydb,app=[unknown],client=localhost DETAIL:  Failed on
request of size 50331648 in memory context "ExecutorState".
2020-09-24 11:08:52.416 CEST [38548]: [18-1]
user=uu,db=mydb,app=[unknown],client=localhost DETAIL:  Failed on
request of size 20 in memory context "MessageContext".
2020-09-24 11:09:23.532 CEST [165335]: [146-1]
user=uu,db=mydb,app=[unknown],client=localhost DETAIL:  Failed on
request of size 4 in memory context "MessageContext".
2020-09-24 11:09:26.541 CEST [25318]: [130-1]
user=uu,db=mydb,app=[unknown],client=localhost DETAIL:  Failed on
request of size 16 in memory context "MessageContext".
2020-09-24 11:09:30.013 CEST [57650]: [274-1]
user=uu,db=mydb,app=[unknown],client=localhost DETAIL:  Failed on
request of size 16 in memory context "MessageContext".
2020-09-24 11:09:33.333 CEST [28889]: [57-1]
user=uu,db=mydb,app=[unknown],client=localhost DETAIL:  Failed on
request of size 40 in memory context "MessageContext".
2020-09-24 11:09:33.344 CEST [138441]: [75422-1]
user=uu,db=mydb,app=[unknown],client=localhost DETAIL:  Failed on
request of size 100663296 in memory context "ExecutorState".

2020-09-24 11:40:42.947 CEST [278349]: [224-1]
user=xx,db=mydb,app=[unknown],client=localhost DETAIL:  Failed while
creating memory context "ExprContext".
2020-09-24 11:40:42.957 CEST [67802]: [69-1]
user=xx,db=mydb,app=[unknown],client=localhost DETAIL:  Failed on
request of size 16 in memory context "MessageContext".
2020-09-24 11:40:42.982 CEST [147716]: [86655-1]
user=yy,db=mydb,app=[unknown],client=localhost DETAIL:  Failed on
request of size 50331648 in memory context "ExecutorState".
2020-09-24 11:41:41.390 CEST [29148]: [130-1]
user=xx,db=mydb,app=[unknown],client=localhost DETAIL:  Failed on
request of size 40 in memory context "MessageContext".
2020-09-24 11:41:46.677 CEST [68287]: [61-1]
user=xx,db=mydb,app=[unknown],client=localhost DETAIL:  Failed on
request of size 16 in memory context "MessageContext".
2020-09-24 11:41:51.623 CEST [139332]: [24-1]
user=xx,db=mydb,app=[unknown],client=localhost DETAIL:  Failed on
request of size 40 in memory context "MessageContext".
2020-09-24 11:41:56.466 CEST [103039]: [87-1]
user=xx,db=mydb,app=[unknown],client=localhost DETAIL:  Failed on
request of size 16 in memory context "MessageContext".
2020-09-24 11:43:27.745 CEST [103028]: [289-1]
user=xx,db=mydb,app=[unknown],client=localhost DETAIL:  Failed on
request of size 16 in memory context "MessageContext".
2020-09-24 11:43:34.057 CEST [84831]: [44-1]
user=xx,db=mydb,app=[unknown],client=localhost DETAIL:  Failed on
request of size 48 in memory context "MessageContext".
2020-09-24 11:43:40.780 CEST [43096]: [56-1]
user=xx,db=mydb,app=[unknown],client=localhost DETAIL:  Failed on
request of size 40 in memory context "MessageContext".
2020-09-24 11:43:47.387 CEST [99915]: [106-1]
user=xx,db=mydb,app=[unknown],client=localhost DETAIL:  Failed on
request of size 16 in memory context "MessageContext".
2020-09-24 11:44:34.639 CEST [101187]: [87-1]
user=xx,db=mydb,app=[unknown],client=localhost DETAIL:  Failed on
request of size 40 in memory context "MessageContext".
2020-09-24 11:44:59.846 CEST [99961]: [106-1]
user=xx,db=mydb,app=[unknown],client=localhost DETAIL:  Failed on
request of size 16 in memory context "MessageContext".
2020-09-24 11:44:59.858 CEST [182506]: [549-1]
user=uu_fr,db=mydb,app=[unknown],client=localhost DETAIL: Failed on
request of size 24 in memory context "MessageContext".

2020-09-24 11:50:58.903 CEST [276980]: [165-1]
user=xx,db=mydb,app=[unknown],client=localhost DETAIL:  Failed while
creating memory context "ExprContext".
2020-09-24 11:51:07.226 CEST [67666]: [158-1]
user=xx,db=mydb,app=[unknown],client=localhost DETAIL:  Failed while
creating memory context "ExprContext".
2020-09-24 11:51:16.258 CEST [30638]: [187-1]
user=xx,db=mydb,app=[unknown],client=localhost DETAIL:  Failed on
request of size 8344 in memory context "ExecutorState".
2020-09-24 11:51:16.261 CEST [214950]: [8-1]
user=xx,db=mydb,app=[unknown],client=localhost DETAIL:  Failed on
request of size 8344 in memory context "ExecutorState".
2020-09-24 11:51:16.745 CEST [67802]: [89-1]
user=xx,db=mydb,app=[unknown],client=localhost DETAIL:  Failed on
request of size 16 in memory context "MessageContext".
2020-09-24 11:51:17.265 CEST [196961]: [27-1]
user=xx,db=mydb,app=[unknown],client=localhost DETAIL:  Failed on
request of size 16 in memory context "MessageContext".
2020-09-24 11:51:17.266 CEST [182506]: [31574-1]
user=uu_fr,db=mydb,app=[unknown],client=localhost DETAIL: Failed on
request of size 40 in memory context "MessageContext".
2020-09-24 11:51:17.692 CEST [278113]: [126-1]
user=xx,db=mydb,app=[unknown],client=localhost DETAIL:  Failed on
request of size 216 in memory context "MessageContext".
2020-09-24 11:51:19.293 CEST [215439]: [2-1] user=,db=,app=,client=
DETAIL:  Failed on request of size 85408 in memory context
"TopMemoryContext".
2020-09-24 11:51:19.298 CEST [215438]: [2-1] user=,db=,app=,client=
DETAIL:  Failed while creating memory context "dynahash".
2020-09-24 11:51:19.344 CEST [182506]: [31603-1]
user=uu_fr,db=mydb,app=[unknown],client=localhost DETAIL: Failed on
request of size 40 in memory context "MessageContext".
2020-09-24 11:51:20.218 CEST [215488]: [2-1] user=,db=,app=,client=
DETAIL:  Failed on request of size 85408 in memory context
"TopMemoryContext".
2020-09-24 11:51:20.229 CEST [215485]: [2-1] user=,db=,app=,client=
DETAIL:  Failed on request of size 424 in memory context
"CacheMemoryContext".
2020-09-24 11:51:21.233 CEST [215513]: [8-1]
user=uu,db=mydb,app=[unknown],client=localhost DETAIL:  Failed on
request of size 8344 in memory context "unnamed prepared statement".
2020-09-24 11:51:21.297 CEST [215524]: [2-1] user=,db=,app=,client=
DETAIL:  Failed while creating memory context "ExprContext".

I have a feeling that postgres memory is limited, while there is still
usable memory left on the server.

Regards.

Le 23/09/2020 à 10:00, Hannah Huang a écrit :
>
>
>> On 23 Sep 2020, at 2:14 pm, Hannah Huang <hannah(dot)huang(dot)y(at)gmail(dot)com
>> <mailto:hannah(dot)huang(dot)y(at)gmail(dot)com>> wrote:
>>
>>
>>
>>> On 23 Sep 2020, at 12:59 am, JOIGNY Michael @Neteven
>>> <mjoigny(at)neteven(dot)com <mailto:mjoigny(at)neteven(dot)com>> wrote:
>>>
>>> Hi Hannah,
>>>
>>> Thanks for your feedback.
>>>
>>> Here are the CPU / IO graphs over the same period.
>>>
>>> You will find on the graphs the annotation which specifies the
>>> moment of the migration.
>>>
>>>
>>> <cpu.png>
>>>
>>> <io_requests.png>
>>>
>>> <io_bytes.png>
>>>
>>> <io_time.png>
>>>
>>>
>>> Le 22/09/2020 à 15:10, Hannah Huang a écrit :
>>>>
>>>>
>>>>> On 22 Sep 2020, at 6:20 pm, JOIGNY Michael @Neteven
>>>>> <mjoigny(at)neteven(dot)com <mailto:mjoigny(at)neteven(dot)com>> wrote:
>>>>>
>>>>> Hi,
>>>>>
>>>>> I would like to come back to you about my memory problem on
>>>>> postgres 12.
>>>>>
>>>>> We had the same configuration under postgres 11.8, we disabled JIT
>>>>> (enabled by default under postgres 12) for segfault problems.
>>>>>
>>>>> To illustrate the change in memory behavior, here is a screenshot
>>>>> before / after migration :
>>>>>
>>>>> <memory.png>
>>>>>
>>>>>
>>>>> Do you have any idea what could change the behavior in this way?
>>>>> another parameter enabled by default under postgres 12 like JIT?
>>>>>
>>>>> Regards.
>>>>>
>>>>> Le 18/09/2020 à 12:03, JOIGNY Michael @Neteven a écrit :
>>>>>>
>>>>>> Hi Community,
>>>>>>
>>>>>> I'm asking for your lights because i'm having memory problems
>>>>>> with postgres.
>>>>>>
>>>>>> Examples of logs :
>>>>>>
>>>>>> /FATAL: could not fork new process for connection: Cannot
>>>>>> allocate memory could not fork new process for connection: Cannot
>>>>>> allocate memory
>>>>>> out of memory  DETAIL:  Failed on request of size 32800 in memory
>>>>>> context "HashBatchContext".
>>>>>> out of memory DETAIL Failed on request of size 288 in memory
>>>>>> context "CacheMemoryContext".
>>>>>> /
>>>>>>
>>>>>> We use postgresql (primary/standby) with pgbouncer as a pooler,
>>>>>> and repmgr as replication manager.
>>>>>>
>>>>>> We have ~ 2000 connections at the same time with ~ 20/30 are
>>>>>> active. (we need to set a high number of connexion on postgres,
>>>>>> because our app uses a lot of different users, and each user on
>>>>>> each app server needs multiple and constant connexions).
>>>>>>
>>>>>> Here is my configuration :
>>>>>>
>>>>>> _system : _
>>>>>>
>>>>>> Debian : 9.13
>>>>>> Memory : 380 Go
>>>>>> Postgres : 12.4-1.pgdg90+1
>>>>>> Pgbouncer : 1.14
>>>>>> kernel.shmmax = 202591600640
>>>>>> kernel.shmall = 49460840
>>>>>>
>>>>>> _postgres :
>>>>>> _
>>>>>>
>>>>>> dynamic_shared_memory_type = posix    # the default is the first
>>>>>> option
>>>>>> max_connections = 2600            # (change requires restart)
>>>>>> work_mem = 96MB                # min 64kB
>>>>>> maintenance_work_mem = 8GB        # min 1MB
>>>>>> shared_buffers = 64GB            # min 128kB
>>>>>> temp_buffers = 32MB            # min 800kB
>>>>>> wal_buffers = 16MB            # min 32kB, -1 sets based on
>>>>>> shared_buffers
>>>>>> effective_cache_size = 270GB
>>>>>>
>>>>>> pgbouncer :
>>>>>>
>>>>>> max_client_conn = 6000
>>>>>> default_pool_size = 2590
>>>>>> reserve_pool_size = 5
>>>>>> pool_mode = session
>>>>>>
>>>>>> Do you think that our parameters are not correct compared to our
>>>>>> configuration? Do you have an idea ?
>>>>>>
>>>>>> Best regards.
>>>>>>
>>>>>> Michael.
>>>>>>
>>>> System settings:
>>>> After 9.2, PostgreSQL switched to POSIX shared memory. So now it
>>>> requires fewer bytes of System V shared memory.
>>>>
>>>> You don’t need to configure SHMMAX to that high value which is
>>>> ~188GB, same as SHMALL, I would set them back to default value.
>>>>
>>>> But, from the second email you sent it seems like an issue related
>>>> to the version upgrade (from 11 to 12). So I don’t think you should
>>>> change OS parameters at this moment. I would suggest you provide
>>>> more stats on CPU (system, user, wa,ni) and disk IO - before and
>>>> after.
>>>>
>>>> Thanks, Hannah
>>
>> I can see:
>>
>> CPU stats: CPU it spends more time on user job and the wait time on
>> IO decreases, time spent on kernal space is about same as before.
>> IO stats: read drops dramatically, write slightly increases. This
>> explains why CPU wait on IO decreases.
>>
>> I would like to know how is the performance of application queries.
>> For example, the before and after execution time of the most time
>> consuming queries. Are they still about the same? Do you have any
>> database performance analytics tools installed?  I used to use
>> pgbadger to analyze  database logs which would show me the past
>> performance statistics of the database. We now have an understanding
>> about how the OS performance looks like.  It would be good to have a
>> general picture of the database performance from the db layer.
>>
>> Thanks,
>> Hannah
>>
>>
>>
>>
>
> Continue with my question regards application performance - Does
> database experience performance degradation? Or just simply more
> failed SQL with the particular memory error thrown out?
>
> Further question:
>
> How often does the memory errors occur? Are they all caused by one
> particular SQL or a certain type of SQL?
>
> Would you mind sending out the log within a particular time range?
>
>
>
>
> Thanks,
> Suya
>
--
Cordialement.
--------------------

NETEVEN
Michael JOIGNY
Service Exploitation - Ingénieur Systèmes
190 Avenue Jean Jaures - 75019 Paris
mjoigny(at)neteven(dot)com / adminsys(at)neteven(dot)com
www.neteven.com

In response to

Browse pgsql-admin by date

  From Date Subject
Next Message Magnus Hagander 2020-09-24 15:30:15 Re: HIPAA Business Associate Agreement (BAA)
Previous Message Ganesh Korde 2020-09-24 08:18:51 Re: The default database account can be accessed without a password