Re: PostgreSQL 14.4 ERROR: out of memory issues

From: Aleš Zelený <zeleny(dot)ales(at)gmail(dot)com>
To: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
Cc: "pgsql-generallists(dot)postgresql(dot)org" <pgsql-general(at)lists(dot)postgresql(dot)org>
Subject: Re: PostgreSQL 14.4 ERROR: out of memory issues
Date: 2022-07-11 13:34:20
Message-ID: CAODqTUbChP0c=Jnx7+GRqkSK3OeBW+C47iyzEf8zAvDYJ8_RQQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Hello,

po 11. 7. 2022 v 14:47 odesílatel Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
napsal:

> Hi
>
> It's looks like memory leak
>

As we are collecting some data for Postgres processes, I've found processes
using Rss Anonymous memory allocations:

Some examples:
2022-07-08T14:50:24+0200 Name: postmaster Pid: 31522 RssAnon: 742600 kB
2022-07-08T14:51:13+0200 Name: postmaster Pid: 31522 RssAnon: 742600 kB
2022-07-08T14:52:05+0200 Name: postmaster Pid: 31522 RssAnon: 742852 kB
2022-07-08T14:52:58+0200 Name: postmaster Pid: 31522 RssAnon: 743112 kB
2022-07-08T14:53:50+0200 Name: postmaster Pid: 31522 RssAnon: 743112 kB
2022-07-08T14:50:24+0200 Name: postmaster Pid: 31647 RssAnon: 684632 kB
2022-07-08T14:51:13+0200 Name: postmaster Pid: 31647 RssAnon: 684632 kB
2022-07-08T14:52:05+0200 Name: postmaster Pid: 31647 RssAnon: 684676 kB
2022-07-08T14:52:58+0200 Name: postmaster Pid: 31647 RssAnon: 684700 kB
2022-07-08T14:53:50+0200 Name: postmaster Pid: 31647 RssAnon: 684824 kB
2022-07-08T14:50:27+0200 Name: postmaster Pid: 7866 RssAnon: 1180960 kB
2022-07-08T14:51:16+0200 Name: postmaster Pid: 7866 RssAnon: 1180960 kB
2022-07-08T14:52:08+0200 Name: postmaster Pid: 7866 RssAnon: 1180960 kB
2022-07-08T14:53:01+0200 Name: postmaster Pid: 7866 RssAnon: 1180960 kB
2022-07-08T14:53:53+0200 Name: postmaster Pid: 7866 RssAnon: 1180960 kB

other processes uses less memory:
2022-07-08T14:52:08+0200 Name: postmaster Pid: 3869 RssAnon: 3256 kB
2022-07-08T14:53:02+0200 Name: postmaster Pid: 3869 RssAnon: 3256 kB
2022-07-08T14:53:54+0200 Name: postmaster Pid: 3869 RssAnon: 3256 kB
2022-07-08T14:50:27+0200 Name: postmaster Pid: 4217 RssAnon: 2532 kB
2022-07-08T14:51:16+0200 Name: postmaster Pid: 4217 RssAnon: 2532 kB

And Shared memory:
2022-07-08T14:59:12+0200 Name: postmaster Pid: 5719 RssShmem: 908264 kB
2022-07-08T14:59:20+0200 Name: postmaster Pid: 5719 RssShmem: 908264 kB
2022-07-08T14:59:28+0200 Name: postmaster Pid: 5719 RssShmem: 908264 kB
2022-07-08T14:59:37+0200 Name: postmaster Pid: 5719 RssShmem: 908264 kB
2022-07-08T14:59:45+0200 Name: postmaster Pid: 5719 RssShmem: 908264 kB
2022-07-08T14:59:53+0200 Name: postmaster Pid: 5719 RssShmem: 908264 kB
2022-07-08T14:50:27+0200 Name: postmaster Pid: 5721 RssShmem: 1531656 kB
2022-07-08T14:51:16+0200 Name: postmaster Pid: 5721 RssShmem: 1531656 kB
2022-07-08T14:52:08+0200 Name: postmaster Pid: 5721 RssShmem: 1531656 kB
2022-07-08T14:53:01+0200 Name: postmaster Pid: 5721 RssShmem: 1531656 kB
2022-07-08T14:53:53+0200 Name: postmaster Pid: 5721 RssShmem: 1531656 kB

while other processes uses failry less of shared mem:

2022-07-08T14:55:25+0200 Name: postmaster Pid: 8521 RssShmem: 1988 kB
2022-07-08T14:55:33+0200 Name: postmaster Pid: 8521 RssShmem: 1988 kB
2022-07-08T14:55:40+0200 Name: postmaster Pid: 8521 RssShmem: 2104 kB
2022-07-08T14:55:49+0200 Name: postmaster Pid: 8521 RssShmem: 2104 kB
2022-07-08T14:55:57+0200 Name: postmaster Pid: 8521 RssShmem: 2104 kB
2022-07-08T14:56:06+0200 Name: postmaster Pid: 8521 RssShmem: 2104 kB
2022-07-08T14:56:15+0200 Name: postmaster Pid: 8521 RssShmem: 2104 kB

Wich processes are better candidates for investigation, the ones using more
Anonymous memory or the ones using larger shared memory?
I can do the (tedious) work to get a list of statements that a given PID
executed (at least the statements that appear in the minute snapshots).

> ErrorContext: 8192 total in 1 blocks; 7928 free (5 chunks); 264 used
>> Grand total: 1456224 bytes in 195 blocks; 378824 free (165 chunks);
>> 1077400 used
>> 2022-07-02 14:48:07 CEST [3930]: [3-1] user=,db=,host=,app= ERROR: out
>> of memory
>> 2022-07-02 14:48:07 CEST [3930]: [4-1] user=,db=,host=,app= DETAIL:
>> Failed on request of size 152094068 in memory context
>> "TopTransactionContext".
>> 2022-07-02 14:48:07 CEST [3930]: [5-1] user=,db=,host=,app= CONTEXT:
>> automatic vacuum of table "prematch.replication.tab_queue_tmp"
>>
>> 2022-07-02 14:48:47 CEST [4476]: [42-1] user=,db=,host=,app= LOG: could
>> not fork worker process: Cannot allocate memory
>> 2022-07-02 14:48:47 CEST [4476]: [43-1] user=,db=,host=,app= LOG: could
>> not fork worker process: Cannot allocate memory
>> terminate called after throwing an instance of 'std::bad_alloc'
>> what(): std::bad_alloc
>>
> 2022-07-02 14:48:47 CEST [4476]: [44-1] user=,db=,host=,app= LOG:
>> background worker "parallel worker" (PID 4303) was terminated by signal 6:
>> Aborted
>>
>>
> But what you see is probably just side effect. Do you use some extension
> in C++? This is C++ message. This process kills Postgres.
>

These extensions are installed (and no change from 14.3):

List of installed extensions
Name | Version | Schema |
Description
--------------------+---------+------------+------------------------------------------------------------------------
amcheck | 1.3 | public | functions for verifying
relation integrity
dblink | 1.2 | public | connect to other PostgreSQL
databases from within a database
file_fdw | 1.0 | public | foreign-data wrapper for flat
file access
hstore | 1.8 | public | data type for storing sets of
(key, value) pairs
hypopg | 1.3.1 | public | Hypothetical indexes for
PostgreSQL
pageinspect | 1.9 | public | inspect the contents of
database pages at a low level
pg_buffercache | 1.3 | public | examine the shared buffer cache
pg_stat_kcache | 2.2.0 | public | Kernel statistics gathering
pg_stat_statements | 1.9 | public | track planning and execution
statistics of all SQL statements executed
pgcrypto | 1.3 | public | cryptographic functions
pgstattuple | 1.5 | public | show tuple-level statistics
plpgsql | 1.0 | pg_catalog | PL/pgSQL procedural language
plpgsql_check | 2.1 | public | extended check for plpgsql
functions
postgres_fdw | 1.1 | public | foreign-data wrapper for
remote PostgreSQL servers
tablefunc | 1.0 | public | functions that manipulate
whole tables, including crosstab
(15 rows)

>
> It can be related to some operation over
> prematch.replication.tab_queue_tmp because in both cases an atovacuum was
> started.
>

On 14.3 (and also older major versions), there were no such issues. This
table is heavily updated, so the autovacuum is frequent here.

>
>
>>
>>
>> 132 hba parser context: 25600 total in 6 blocks; 4464 free (8 chunks);
>> 21136 used
>> 133 ErrorContext: 8192 total in 1 blocks; 7928 free (5 chunks); 264 used
>> 134Grand total: 1456224 bytes in 195 blocks; 378824 free (165 chunks);
>> 1077400 used
>> 1352022-07-02 14:48:07 CEST [3930]: [3-1] user=,db=,host=,app= ERROR:
>> out of memory
>> 1362022-07-02 14:48:07 CEST [3930]: [4-1] user=,db=,host=,app= DETAIL:
>> Failed on request of size 152094068 in memory context
>> "TopTransactionContext".
>> 1372022-07-02 14:48:07 CEST [3930]: [5-1] user=,db=,host=,app= CONTEXT:
>> automatic vacuum of table "prematch.replication.tab_queue_tmp"
>>
>
> I am not sure, but it looks like corrupted table
> prematch.replication.tab_queue_tmp
>

$ /usr/pgsql-14/bin/pg_amcheck -v -d prematch -t replication.tab_queue_tmp
--heapallindexed
pg_amcheck: including database "prematch"
pg_amcheck: in database "prematch": using amcheck version "1.3" in schema
"public"
pg_amcheck: checking heap table "prematch.replication.tab_queue_tmp"
pg_amcheck: checking btree index "prematch.replication.tab_queue_tmp_pkey"
$ echo $?
0

Something else I can check?

Thanks Ales

>
>> And a few days later again:
>> 2022-07-08 14:54:22 CEST [4791]: [81186-1]
>> user=app_evaluation,db=prematch,host=172.25.0.80,app=Evaluation_STS_UAT
>> DETAIL: parameters: $1 = '1', $2 = '1748010445', $3 = '0', $4 = '1000' terminate
>> called after throwing an instance of 'std::bad_alloc' terminate called
>> after throwing an instance of 'std::bad_alloc' what(): what():
>> std::bad_allocstd::bad_alloc 2022-07-08 14:54:23 CEST [4476]: [49-1]
>> user=,db=,host=,app= LOG: background worker "parallel worker" (PID 25251)
>> was terminated by signal 6: Aborted
>> 2022-07-08 14:54:23 CEST [4476]: [51-1] user=,db=,host=,app= LOG:
>> terminating any other active server processes
>>
>>
>> Thanks Ales Zeleny
>>
>

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Jean Baro 2022-07-11 14:09:34 Is there a official benchmark comparing PG versions?
Previous Message Rob Sargent 2022-07-11 12:56:45 Re: Get the table creation DDL