Re: Postgresql 16.3 Out Of Memory

From: Radu Radutiu <rradutiu(at)gmail(dot)com>
To: pgsql-general <pgsql-general(at)postgresql(dot)org>
Subject: Re: Postgresql 16.3 Out Of Memory
Date: 2024-06-05 08:16:49
Message-ID: CAG4TxrjnFxwWLEYrvt90RJ3EC8_kqbcHt87ei6w70W0003RaOw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

It looks like I did not copy the list.

I did run VACUUM ANALYZE after the upgrade. I've even run it now
> before getting a new explain plan that is very similar (just the costs
> differ):
> Gather Merge (cost=12336145.92..16111570.23 rows=31531596 width=66)
> Workers Planned: 4
> -> Sort (cost=12335145.86..12354853.11 rows=7882899 width=66)
> Sort Key: t.msg_status DESC, t.input_sequence
> -> Parallel Hash Right Join (cost=9376528.66..11242773.26
> rows=7882899 width=66)
> Hash Cond: (snd_tro.reply_input_sequence = t.input_sequence)
> -> Parallel Seq Scan on tbl_outputrequest snd_tro
> (cost=0.00..1431919.45 rows=20057645 width=16)
> -> Parallel Hash (cost=9254599.76..9254599.76
> rows=5245992 width=58)
> -> Parallel Hash Right Join
> (cost=7458636.44..9254599.76 rows=5245992 width=58)
> Hash Cond: (rec_tro.input_sequence =
> t.input_sequence)
> -> Parallel Seq Scan on tbl_outputrequest
> rec_tro (cost=0.00..1431919.45 rows=20057645 width=16)
> -> Parallel Hash (cost=7380902.99..7380902.99
> rows=3491156 width=50)
> -> Parallel Hash Right Join
> (cost=5592677.17..7380902.99 rows=3491156 width=50)
> Hash Cond:
> (rpl_snd_tro.reply_input_sequence = r.input_sequence)
> -> Parallel Seq Scan on
> tbl_outputrequest rpl_snd_tro (cost=0.00..1431919.45 rows=20057645
> width=16)
> -> Parallel Hash
> (cost=5518353.72..5518353.72 rows=3491156 width=42)
> -> Parallel Hash Right Join
> (cost=3729209.40..5518353.72 rows=3491156 width=42)
> Hash Cond:
> (rpl_rec_tro.input_sequence = r.input_sequence)
> -> Parallel Seq Scan
> on tbl_outputrequest rpl_rec_tro (cost=0.00..1431919.45 rows=20057645
> width=16)
> -> Parallel Hash
> (cost=3658294.95..3658294.95 rows=3491156 width=34)
> -> Parallel Hash
> Right Join (cost=1883503.35..3658294.95 rows=3491156 width=34)
> Hash Cond:
> (r.originalrequest_id = t.input_sequence)
> ->
> Parallel Seq Scan on tbl_inputrequest r (cost=0.00..1739752.66
> rows=13348166 width=16)
> ->
> Parallel Hash (cost=1839863.91..1839863.91 rows=3491156 width=26)
> ->
> Parallel Seq Scan on tbl_inputrequest t (cost=0.00..1839863.91
> rows=3491156 width=26)
>
> Filter: ((receive_time < '2024-05-17 00:00:00'::timestamp without time
> zone) AND (input_sequence < '202406020168279904'::bigint) AND
> ((msg_status)::text = ANY ('{COMPLETED,REJECTED}'::text[])))
>
> The query cost is high and it returns a significant number of rows.
> However it should not consume 64+ GB RAM with the
> default enable_parallel_hash = 'on' when my shared_buffers is 8GB. The
> temporary fix I've implemented to get the system working is a change in the
> application configuration so that the timestamp filter selects fewer rows
> (3x less), together with setting enable_parallel_hash = 'off'. PostgreSQL
> service memory usage grows and stays over 20GB even with this setting. I'd
> like to find out exactly what causes the high memory usage as we have other
> projects using PostgreSQL and they are scheduled for upgrade from v12.
>
> My test.sql looks like this (application uses prepared statements, the two
> set operations are to revert for the current session the already
> implemented fixes):
>
> prepare my_query (timestamp,bigint) as SELECT t.input_sequence,
> rec_tro.output_sequence, r.input_sequence, rpl_rec_tro.output_sequence,
> rpl_snd_tro.output_sequence, snd_tro.output_sequence, t.msg_type FROM
> inputrequest t LEFT JOIN outputrequest rec_tro ON rec_tro.input_sequence =
> t.input_sequence LEFT JOIN inputrequest r ON r.originalRequest_id =
> t.input_sequence LEFT JOIN outputrequest rpl_rec_tro ON
> rpl_rec_tro.input_sequence = r.input_sequence LEFT JOIN outputrequest
> rpl_snd_tro ON rpl_snd_tro.reply_input_sequence = r.input_sequence LEFT
> JOIN outputrequest snd_tro ON snd_tro.reply_input_sequence =
> t.input_sequence WHERE t.receive_time < $1 AND t.input_sequence < $2 AND
> t.msg_status IN ('COMPLETED', 'REJECTED') ORDER BY t.msg_status DESC,
> t.input_sequence ;
>
> EXPLAIN EXECUTE my_query('2024-05-17 00:00:00', 202406020168279904);
>
> I have an explanation for what I suspected was a memory leak. It seems
that systemd reports cached memory, the relevant part is probably rss
from /sys/fs/cgroup/memory/system.slice/postgresql-16.service/memory.stat
. This still leaves the out of memory situation.
I've managed to get a dump of the memory contexts for the 5 processes
(explain + 4 parallel workers) while the query is starting to use
significant memory (>10G). I've attached it in case anyone familiar with
postgresql internals can see something obviously wrong with the reported
numbers.

Radu

Attachment Content-Type Size
mem.txt text/plain 61.4 KB

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Shammat 2024-06-05 08:31:04 Re: Poor performance after restoring database from snapshot on AWS RDS
Previous Message sud 2024-06-05 07:39:12 Re: Long running query causing XID limit breach