Re: Reference to - BUG #18349: ERROR: invalid DSA memory alloc request size 1811939328, CONTEXT: parallel worker

From: Andrei Lepikhov <lepihov(at)gmail(dot)com>
To: Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Craig Milhiser <craig(at)milhiser(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-bugs(at)lists(dot)postgresql(dot)org
Subject: Re: Reference to - BUG #18349: ERROR: invalid DSA memory alloc request size 1811939328, CONTEXT: parallel worker
Date: 2024-10-14 05:45:09
Message-ID: 96a328e9-4fbf-433b-a8d7-071da18e12c4@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

On 10/14/24 04:08, Thomas Munro wrote:
> On Mon, Oct 14, 2024 at 12:23 AM Craig Milhiser <craig(at)milhiser(dot)com> wrote:
>> I have reproduced the problem with synthetic data. The script is below. Thank you for your patience with me.
>
> Thanks, repro'd here. At first glance, it looks like it's trying to
> load this distribution into a hash table and failing to handle the
> skew as well as non-parallel hash:
> Getting coffee and looking more closely...
Hmm, with reproduction, it is too easy to solve ;)
My explanation (correct if I'm wrong):
OUTER JOINs allow NULLs to be in a hash table. At the same time, a hash
value for NULL is 0, and it goes to the batch==0.
If batch number 0 gets overfilled, the
ExecParallelHashIncreaseNumBatches routine attempts to increase the
number of batches - but nothing happens. The initial batch is still too
big, and the number of batches doubles up to the limit.
At the limit, parallel HashJoin stops this grow and (I didn't trace this
part, just guess) allocates memory for 2097152 batches that causes OOM.
To support this chain of thought, you can see the simple example below,
which triggers the issue:

DROP TABLE IF EXISTS test;
CREATE TABLE test (n int);
INSERT INTO test (n) SELECT NULL FROM generate_series(1,1E6);
INSERT INTO test (n) VALUES (1, 'a');
ANALYZE test;

SET enable_nestloop = 'off';
SET enable_mergejoin = 'off';

SET max_parallel_workers_per_gather = 2;
SET min_parallel_table_scan_size = 0;
SET min_parallel_index_scan_size = 0;
SET parallel_setup_cost = 0.001;
SET parallel_tuple_cost = 0.0001;

EXPLAIN (ANALYZE, VERBOSE, COSTS OFF)
SELECT t1.n FROM test t1 LEFT JOIN test t2 USING (n);

I think, now it is much easier to find a proper solution.

--
regards, Andrei Lepikhov

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Amit Kapila 2024-10-14 06:25:34 Re: BUG #18644: ALTER PUBLICATION ... SET (publish_via_partition_root) wrong/undocumented behavior.
Previous Message Thomas Munro 2024-10-13 21:08:52 Re: Reference to - BUG #18349: ERROR: invalid DSA memory alloc request size 1811939328, CONTEXT: parallel worker