Re: Postgresql OOM

From: Radu Radutiu <rradutiu(at)gmail(dot)com>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: Postgresql OOM
Date: 2024-06-12 11:28:56
Message-ID: CAG4TxrjjjR3BfRWsV3z0c2BL9XZueFPG0uGxeHrhCC+92g05Gg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

> That's helpful, thanks!
>
> One thing to note is that postgres' work_mem is confusing - it applies to
> individual query execution nodes, not the whole query. Additionally, when
> you
> use parallel workers, each of the parallel workers can use the "full"
> work_mem, rather than work_mem being evenly distributed across workers.
>
> Within that, the memory usage in the EXPLAIN ANALYZE isn't entirely
> unexpected
> (I'd say it's unreasonable if you're not a postgres dev, but that's a
> different issue).
>
> We can see each of the Hash nodes use ~1GB, which is due to
> (1 leader + 4 workers) * work_mem = 5 * 200MB = 1GB.
>
> In this specific query we probably could free the memory in the "lower"
> hash
> join nodes once the node directly above has finished building, but we don't
> have the logic for that today.
>

I would understand 1 GB, even 2GB (considering hash_mem_multiplier) but the
server ran out of memory with more than 64 GB.

>
> Of course that doesn't explain why the memory usage / temp file creation
> is so
> extreme with a lower limit / fewer workers. There aren't any bad
> statistics
> afaics, nor can I really see a potential for a significant skew, it looks
> to
> me that the hashtables are all built on a single text field and that nearly
> all the input rows are distinct.
>
> Could you post the table definition (\d+) and the database definition
> (\l). One random guess I have is that you ended up with a
> "non-deterministic"
> collation for that column and that we end up with a bad hash due to that.
>

I was able to eliminate the columns not involved in the query while still
retaining the problematic behavior (that's the reason for the new table
names):
postgres=# \d inreq
Table "public.inreq"
Column | Type | Collation |
Nullable | Default
-------------------------+-----------------------------+-----------+----------+---------
input_sequence | bigint | | not
null |
msg_type | character varying(8) | |
|
originalrequest_id | bigint | |
|
receive_time | timestamp without time zone | |
|
related_output_sequence | bigint | |
|
msg_status | character varying(15) | |
|
Indexes:
"inreq_pkey" PRIMARY KEY, btree (input_sequence)
"inreq_originalrequest_id_idx" btree (originalrequest_id)

postgres=# \d outreq
Table "public.outreq"
Column | Type | Collation | Nullable | Default
------------------------+--------+-----------+----------+---------
output_sequence | bigint | | not null |
input_sequence | bigint | | |
reply_input_sequence | bigint | | |
related_input_sequence | bigint | | |
Indexes:
"outreq_pkey" PRIMARY KEY, btree (output_sequence)
"outreq_input_sequence_idx" btree (input_sequence)
"outreq_reply_input_sequence_idx" btree (reply_input_sequence)
postgres=# SELECT datname, datcollate FROM pg_database WHERE datname =
current_database();
datname | datcollate
----------+-------------
postgres | en_US.UTF-8
(1 row)

A dump of the two tables above can be found at
https://drive.google.com/file/d/1ep1MYjNzlFaICL3GlPZaMdpOxRG6WCGz/view?usp=sharing
(compressed size 1GB; size of database after import 14 GB ).

# prepare my_query (timestamp,bigint) as SELECT t.input_sequence,
rec_tro.output_sequence, r.input_sequence, rpl_rec_tro.output_sequence,
rpl_snd_tro.output_sequence, snd_tro.output_sequence, t.msg_type FROM
inreq t LEFT JOIN outreq rec_tro ON rec_tro.input_sequence =
t.input_sequence LEFT JOIN inreq r ON r.originalRequest_id =
t.input_sequence LEFT JOIN outreq rpl_rec_tro ON
rpl_rec_tro.input_sequence = r.input_sequence LEFT JOIN outreq rpl_snd_tro
ON rpl_snd_tro.reply_input_sequence = r.input_sequence LEFT JOIN outreq
snd_tro ON snd_tro.reply_input_sequence = t.input_sequence WHERE
t.receive_time < $1 AND t.input_sequence < $2 AND t.msg_status IN
('COMPLETED', 'REJECTED') ORDER BY t.msg_status DESC, t.input_sequence ;
# EXPLAIN (ANALYZE,BUFFERS) EXECUTE my_query('2024-05-18 00:00:00',
202406020168279904);

Best Regards,
Radu

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Dilip Kumar 2024-06-12 11:38:49 Re: Logical Replication of sequences
Previous Message Bertrand Drouvot 2024-06-12 11:27:23 Re: Track the amount of time waiting due to cost_delay