Postgresql 16.3 Out Of Memory

From: Radu Radutiu <rradutiu(at)gmail(dot)com>
To: pgsql-general <pgsql-general(at)postgresql(dot)org>
Subject: Postgresql 16.3 Out Of Memory
Date: 2024-06-03 10:19:25
Message-ID: CAG4Txrii4-gTMpGEKLzCSe14svoJGgrh2AY=3LAgmqN2Um=o4A@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Hello,

I have an out of memory problem after upgrading from postgresql 12 to 16.3.
I have identified one query that can reproduce the error on demand. Once
the query starts, it will eventually exhaust all RAM and swap until the OOM
killer will stop postgresql.
The setup is as follows:
- One VMWARE VM, RHEL 8.9, 64 GB RAM, 16 vCPU, 32 GB swap. Database is used
by a local java application that takes up to 22 GB (hard limit). The
application uses 2 db connections (long running, app uses a connection
pool). The database has about 10 tables, with 3 large tables involved in
the problem query:
relation | total_size
-------------------+------------
outputrequest | 217 GB
inputrequest | 149 GB
tran | 5041 MB

I have the following settings:
shared_buffers = '8192MB'
effective_cache_size = '24GB'
maintenance_work_mem = '2GB'
checkpoint_completion_target = '0.9'
wal_buffers = '16MB'
default_statistics_target = '100'
random_page_cost = '1.1'
effective_io_concurrency = '200'
work_mem = '104857kB'
min_wal_size = '1GB'
max_wal_size = '4GB'
max_worker_processes = '16'
max_parallel_workers_per_gather = '4'
max_parallel_workers = '16'
max_parallel_maintenance_workers = '4'
jit = 'off'

Beside OOM there is another (possibly related)problem: the postgresql
memory usage is significantly higher than the shared_buffers + the rest of
the memory used per connection. On my system with shared buffers 8G and
work_mem ~ 100M I have memory usage of 20GB with the system running just
simple inserts (the memory might have increased due to previous queries).

I have found out that if I set enable_parallel_hash = 'off', I do get high
memory usage for the problem query (above 30G) but not OOM. With the
enable_parallel_hash = 'on', I will always get OOM. For simplicity I've
restricted postgresql systemd service to cap the memory at 36 GB (so that I
don't have to wait for the 32 GB swap to be filled before OOM). This is
the memory usage I get when running the query, before systemd restarts
postgres - memory usage is 35.9 GB just below the 36 GB limit (the query is
running with explain analyze,buffers, the other 2 idle connections are from
the application):

[postgres(at)ips3 ~]$ systemctl status postgresql-16; top -u postgres -n 1 -c
● postgresql-16.service - PostgreSQL 16 database server
Loaded: loaded (/etc/systemd/system/postgresql-16.service; enabled;
vendor preset: disabled)
Drop-In: /etc/systemd/system/postgresql-16.service.d
└─override.conf
Active: active (running) since Mon 2024-06-03 04:23:16 +08; 11h ago
Docs: https://www.postgresql.org/docs/16/static/
Process: 283703 ExecStartPre=/usr/pgsql-16/bin/postgresql-16-check-db-dir
${PGDATA} (code=exited, status=0/SUCCESS)
Main PID: 283709 (postgres)
Tasks: 14 (limit: 408404)
Memory: 35.9G (limit: 36.0G)
CGroup: /system.slice/postgresql-16.service
├─283709 /usr/pgsql-16/bin/postgres -D /var/lib/pgsql/16/data/
├─283710 postgres: logger
├─295499 postgres: checkpointer
├─295500 postgres: background writer
├─295506 postgres: walwriter
├─295507 postgres: autovacuum launcher
├─295508 postgres: logical replication launcher
├─295511 postgres: db1 db1 127.0.0.1(37720) idle
├─295946 postgres: postgres db1 [local] EXPLAIN
├─295947 postgres: parallel worker for PID 295946
├─295948 postgres: parallel worker for PID 295946
├─295949 postgres: parallel worker for PID 295946
├─295950 postgres: parallel worker for PID 295946
└─296037 postgres: db1 db1 127.0.0.1(41708) idle

top - 16:14:32 up 14 days, 4:42, 1 user, load average: 5.41, 3.58, 2.25
Tasks: 405 total, 6 running, 399 sleeping, 0 stopped, 0 zombie
%Cpu(s): 1.1 us, 5.2 sy, 0.0 ni, 71.2 id, 21.3 wa, 0.7 hi, 0.4 si,
0.0 st
MiB Mem : 63873.4 total, 2162.0 free, 57177.8 used, 4533.6 buff/cache
MiB Swap: 32260.0 total, 3813.8 free, 28446.2 used. 4349.0 avail Mem

PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+
COMMAND

295947 postgres 20 0 21.2g 7.4g 654168 R 20.0 11.8 1:31.81
postgres: parallel worker for PID 295946

295949 postgres 20 0 21.3g 7.5g 656316 R 20.0 12.0 1:32.57
postgres: parallel worker for PID 295946

295950 postgres 20 0 21.2g 7.5g 654756 R 20.0 12.1 1:31.99
postgres: parallel worker for PID 295946

295946 postgres 20 0 21.2g 7.4g 655684 R 13.3 11.8 1:31.86
postgres: postgres db1 [local] EXPLAIN

295948 postgres 20 0 21.6g 6.0g 656024 R 13.3 9.6 1:37.91
postgres: parallel worker for PID 295946

283709 postgres 20 0 8960628 44144 43684 S 0.0 0.1 0:42.28
/usr/pgsql-16/bin/postgres -D /var/lib/pgsql/16/data/

283710 postgres 20 0 355288 4876 4576 S 0.0 0.0 0:04.42
postgres: logger

295499 postgres 20 0 8960936 107376 106868 S 0.0 0.2 0:01.75
postgres: checkpointer

295500 postgres 20 0 8960764 7764 7364 S 0.0 0.0 0:00.08
postgres: background writer

295506 postgres 20 0 8960764 22008 21624 S 0.0 0.0 0:00.21
postgres: walwriter

295507 postgres 20 0 8962216 6960 6496 S 0.0 0.0 0:00.00
postgres: autovacuum launcher

295508 postgres 20 0 8962232 7436 7060 S 0.0 0.0 0:00.00
postgres: logical replication launcher

295511 postgres 20 0 8974684 133608 131104 S 0.0 0.2 0:23.90
postgres: db1 db1 127.0.0.1(37720) idle

295945 postgres 20 0 274168 6236 5524 S 0.0 0.0 0:00.00 psql
-f test.sql db1

296037 postgres 20 0 8962736 13972 13684 S 0.0 0.0 0:00.00
postgres: db1 db1 127.0.0.1(41708) idle

Below is the execution plan that gives OOM:

QUERY PLAN

-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Gather Merge (cost=41247490.48..45605473.14 rows=36397008 width=66)
Workers Planned: 4
-> Sort (cost=41246490.42..41269238.55 rows=9099252 width=66)
Sort Key: t.msg_status DESC, t.input_sequence
-> Parallel Hash Right Join (cost=33197712.34..39496299.38
rows=9099252 width=66)
Hash Cond: (snd_tro.reply_input_sequence = t.input_sequence)
-> Parallel Seq Scan on outputrequest snd_tro
(cost=0.00..5056151.17 rows=19783917 width=16)
-> Parallel Hash (cost=32856685.24..32856685.24
rows=6148088 width=58)
-> Parallel Hash Right Join
(cost=26706855.23..32856685.24 rows=6148088 width=58)
Hash Cond: (rec_tro.input_sequence =
t.input_sequence)
-> Parallel Seq Scan on outputrequest rec_tro
(cost=0.00..5056151.17 rows=19783917 width=16)
-> Parallel Hash
(cost=26492657.28..26492657.28 rows=4154076 width=50)
-> Parallel Hash Right Join
(cost=20364128.48..26492657.28 rows=4154076 width=50)
Hash Cond:
(rpl_snd_tro.reply_input_sequence = r.input_sequence)
-> Parallel Seq Scan on
outputrequest rpl_snd_tro (cost=0.00..5056151.17 rows=19783917 width=16)
-> Parallel Hash
(cost=20166158.53..20166158.53 rows=4154076 width=42)
-> Parallel Hash Right Join
(cost=14048784.52..20166158.53 rows=4154076 width=42)
Hash Cond:
(rpl_rec_tro.input_sequence = r.input_sequence)
-> Parallel Seq Scan on
outputrequest rpl_rec_tro (cost=0.00..5056151.17 rows=19783917 width=16)
-> Parallel Hash
(cost=13867042.57..13867042.57 rows=4154076 width=34)
-> Parallel Hash
Right Join (cost=6730983.28..13867042.57 rows=4154076 width=34)
Hash Cond:
(r.originalrequest_id = t.input_sequence)
-> Parallel
Seq Scan on inputrequest r (cost=0.00..6465209.90 rows=13367390 width=16)
-> Parallel
Hash (cost=6565465.33..6565465.33 rows=4154076 width=26)
->
Parallel Seq Scan on inputrequest t (cost=0.00..6565465.33 rows=4154076
width=26)

Filter: ((receive_time < '2024-05-17 00:00:00'::timestamp without time
zone) AND (input_sequence < '202406020168279904'::bigint) AND
((msg_status)::text = ANY ('{COMPLETED,REJECTED}'::text[])))
(26 rows)

This is the execution plan with enable_parallel_hash = 'off' which runs
but still uses more than 30GB RAM:

-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Gather Merge (cost=51681038.73..56036969.38 rows=36379870 width=66)
(actual time=68534.468..72489.115 rows=14895689 loops=1)
Workers Planned: 4
Workers Launched: 4
Buffers: shared hit=212733952 read=3252691 dirtied=28327 written=35494,
temp read=1350681 written=1350691
-> Sort (cost=51680038.67..51702776.09 rows=9094968 width=66) (actual
time=68352.387..68953.103 rows=2979138 loops=5)
Sort Key: t.msg_status DESC, t.input_sequence
Sort Method: external merge Disk: 179448kB
Buffers: shared hit=212733952 read=3252691 dirtied=28327
written=35494, temp read=1350681 written=1350691
Worker 0: Sort Method: external merge Disk: 177528kB
Worker 1: Sort Method: external merge Disk: 178216kB
Worker 2: Sort Method: external merge Disk: 177272kB
Worker 3: Sort Method: external merge Disk: 176248kB
-> Hash Left Join (cost=8179839.27..49930701.19 rows=9094968
width=66) (actual time=25952.805..64809.594 rows=2979138 loops=5)
Hash Cond: (t.input_sequence = rec_tro.input_sequence)
Buffers: shared hit=212733888 read=3252691 dirtied=28327
written=35494, temp read=1239592 written=1239592
-> Nested Loop Left Join (cost=2.27..38995500.02
rows=6144595 width=58) (actual time=1.541..27287.533 rows=2979138 loops=5)
Buffers: shared hit=207879681 read=2037114
dirtied=1546 written=35418
-> Nested Loop Left Join (cost=1.70..30880205.10
rows=4151312 width=50) (actual time=1.236..21739.231 rows=2979138 loops=5)
Buffers: shared hit=140861726 read=2004019
dirtied=1546 written=34842
-> Nested Loop Left Join
(cost=1.13..22788518.11 rows=4151312 width=42) (actual
time=1.082..18110.990 rows=2979138 loops=5)
Buffers: shared hit=103757087 read=1971773
dirtied=1546 written=33891
-> Nested Loop Left Join
(cost=0.56..14665384.33 rows=4151312 width=34) (actual
time=0.718..14005.092 rows=2979138 loops=5)
Buffers: shared hit=66987152
read=1604817 dirtied=1546 written=20820
-> Parallel Seq Scan on
inputrequest t (cost=0.00..6561097.71 rows=4151312 width=26) (actual
time=0.291..4167.263 rows=2979138 loops=5)
Filter: ((receive_time <
'2024-05-17 00:00:00'::timestamp without time zone) AND (input_sequence <
'202406020168279904'::bigint) AND ((msg_status)::text = ANY
('{COMPLETED,REJECTED}'::text[])))
Rows Removed by Filter: 7555153
Buffers: shared hit=693060
read=888771 dirtied=1546 written=395
-> Index Scan using
inputrequest_originalrequest_id_idx on inputrequest r (cost=0.56..1.94
rows=1 width=16) (actual time=0.003..0.003 rows=0 loops=14895689)
Index Cond:
(originalrequest_id = t.input_sequence)
Buffers: shared hit=66294092
read=716046 written=20425
-> Index Scan using
outputrequest_input_sequence_idx on outputrequest rpl_rec_tro
(cost=0.57..1.95 rows=1 width=16) (actual time=0.001..0.001 rows=0
loops=14895689)
Index Cond: (input_sequence =
r.input_sequence)
Buffers: shared hit=36769935
read=366956 written=13071
-> Index Scan using
outputrequest_reply_input_sequence_idx on outputrequest rpl_snd_tro
(cost=0.57..1.94 rows=1 width=16) (actual time=0.001..0.001 rows=0
loops=14895689)
Index Cond: (reply_input_sequence =
r.input_sequence)
Buffers: shared hit=37104639 read=32246
written=951
-> Index Scan using
outputrequest_reply_input_sequence_idx on outputrequest snd_tro
(cost=0.57..1.94 rows=1 width=16) (actual time=0.002..0.002 rows=1
loops=14895689)
Index Cond: (reply_input_sequence =
t.input_sequence)
Buffers: shared hit=67017955 read=33095
written=576
-> Hash (cost=5646459.11..5646459.11 rows=79090711
width=16) (actual time=25930.971..25930.972 rows=52601291 loops=5)
Buckets: 4194304 Batches: 32 Memory Usage: 109766kB
Buffers: shared hit=4853863 read=1215577 dirtied=26781
written=76, temp written=1119615
-> Seq Scan on outputrequest rec_tro
(cost=0.00..5646459.11 rows=79090711 width=16) (actual
time=0.114..17424.298 rows=79099765 loops=5)
Buffers: shared hit=4853863 read=1215577
dirtied=26781 written=76
Planning:
Buffers: shared hit=328 read=12
Planning Time: 3.861 ms
Execution Time: 73323.032 ms
(48 rows)

After successfully running the query, the postgresql memory usage is
● postgresql-16.service - PostgreSQL 16 database server
Loaded: loaded (/etc/systemd/system/postgresql-16.service; enabled;
vendor preset: disabled)
Drop-In: /etc/systemd/system/postgresql-16.service.d
└─override.conf
Active: active (running) since Mon 2024-06-03 04:23:16 +08; 13h ago
Docs: https://www.postgresql.org/docs/16/static/
Process: 283703 ExecStartPre=/usr/pgsql-16/bin/postgresql-16-check-db-dir
${PGDATA} (code=exited, status=0/SUCCESS)
Main PID: 283709 (postgres)
Tasks: 9 (limit: 408404)
Memory: 31.3G (limit: 36.0G)
CGroup: /system.slice/postgresql-16.service
├─283709 /usr/pgsql-16/bin/postgres -D /var/lib/pgsql/16/data/
├─283710 postgres: logger
├─296122 postgres: checkpointer
├─296123 postgres: background writer
├─296128 postgres: walwriter
├─296129 postgres: autovacuum launcher
├─296130 postgres: logical replication launcher
├─296133 postgres: db1 db1 127.0.0.1(52340) idle
└─297725 postgres: db1 db1 127.0.0.1(33216) idle

top - 18:09:44 up 14 days, 6:37, 1 user, load average: 0.01, 0.07, 0.08
Tasks: 394 total, 2 running, 392 sleeping, 0 stopped, 0 zombie
%Cpu(s): 0.4 us, 0.7 sy, 0.0 ni, 98.5 id, 0.0 wa, 0.0 hi, 0.4 si,
0.0 st
MiB Mem : 63873.4 total, 3798.3 free, 22165.3 used, 37909.8 buff/cache
MiB Swap: 32260.0 total, 31045.3 free, 1214.7 used. 32600.4 avail Mem

PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+
COMMAND

297728 postgres 20 0 264520 4844 3808 R 6.2 0.0 0:00.02 top
-u postgres -n 1 -c

283709 postgres 20 0 8960628 198604 197992 S 0.0 0.3 1:23.66
/usr/pgsql-16/bin/postgres -D /var/lib/pgsql/16/data/

283710 postgres 20 0 355288 4808 4576 S 0.0 0.0 0:04.43
postgres: logger

294897 postgres 20 0 226848 5936 3296 S 0.0 0.0 0:00.18
-bash

296122 postgres 20 0 8960912 2.5g 2.5g S 0.0 4.0 0:03.75
postgres: checkpointer

296123 postgres 20 0 8960764 161884 161160 S 0.0 0.2 0:00.41
postgres: background writer

296128 postgres 20 0 8960764 22404 21748 S 0.0 0.0 0:00.49
postgres: walwriter

296129 postgres 20 0 8962216 7800 6872 S 0.0 0.0 0:00.00
postgres: autovacuum launcher

296130 postgres 20 0 8962232 7784 6884 S 0.0 0.0 0:00.00
postgres: logical replication launcher

296133 postgres 20 0 8964788 2.2g 2.2g S 0.0 3.6 1:29.51
postgres: db1 db1 127.0.0.1(52340) idle

297725 postgres 20 0 8962736 17620 16076 S 0.0 0.0 0:00.00
postgres: db1 db1 127.0.0.1(33216) idle

Does anyone else see higher than expected memory usage reported by
systemctl status?
Do you have any idea how to further debug the problem?

Best Regards,
Radu

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Greg Sabino Mullane 2024-06-03 13:12:13 Re: Postgresql 16.3 Out Of Memory
Previous Message Sam Kidman 2024-06-03 08:06:18 Poor performance after restoring database from snapshot on AWS RDS