Re: Reference to - BUG #18349: ERROR: invalid DSA memory alloc request size 1811939328, CONTEXT: parallel worker

From: Craig Milhiser <craig(at)milhiser(dot)com>
To: Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-bugs(at)lists(dot)postgresql(dot)org
Subject: Re: Reference to - BUG #18349: ERROR: invalid DSA memory alloc request size 1811939328, CONTEXT: parallel worker
Date: 2024-10-02 00:12:41
Message-ID: CA+wnhO3kT7mxbadbteppSZVcs3FRPvGL5xTY2DbrWSGohmhyVw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

> Since you're building from source, you could try applying the patch
>posted by Andrei Lephikov:
>
https://www.postgresql.org/message-id/7d763a6d-fad7-49b6-beb0-86f99ce4a6eb%40postgrespro.ru

This did not work for me. I am running out of memory.

I applied the patch, make clean, make, make check, sudo make install. I am
running out of the box Postgres configuration.

Memory below uses "free -m".

Before loading Postgres
total used free shared buff/cache
available
Mem: 31388 669 30467 2 639
30719
Swap: 0 0 0

After loading
total used free shared buff/cache
available
Mem: 31388 672 30464 14 651
30715
Swap: 0 0 0

I go into psql
set max_parallel_workers_per_gather = 0;
run the query multiple times, takes 9.5 seconds at steady state, returns 20
rows.

Memory is still available

total used free shared buff/cache
available
Mem: 31388 921 22547 142 8460
30466
Swap: 0 0 0

In the same psql session, set max_parallel_workers_per_gather = 2; then run
the query again. This runs for 1 minute then:

2024-10-01 18:28:45.883 UTC [2586] LOG: background worker "parallel
worker" (PID 4465) was terminated by signal 9: Killed
2024-10-01 18:28:45.883 UTC [2586] DETAIL: Failed process was running:
SELECT
...
2024-10-01 18:28:45.883 UTC [2586] LOG: terminating any other active
server processes
2024-10-01 18:28:46.620 UTC [2586] LOG: all server processes terminated;
reinitializing

I got this as close to the end as I could
total used free shared buff/cache
available
Mem: 31388 31014 535 1955 2156
373
Swap: 0 0 0

Though OOM conditions often means all bets are off for behavior, I tried
something different. I rebooted, started Postgres then run the query. I do
not set parallel_... = 0 and run the query which populated the cache. The
machine exhausts memory again but usually "hangs". I need to restart.
Below is the frozen screen
total used free shared buff/cache
available
Mem: 31388 31317 240 1955 2140
70
Swap: 0 0 0

I ran these sequences multiple times. I also analyzed the data again just
to make sure.

I reverted the patch to make sure I am reproducing the issue. I get the
same 1.8GB allocation failure with parallel. Without parallel the query
takes ~10 seconds. The patch increased the single worker performance for
this query for out of the box configuration by 5%.

Thanks

On Sun, Sep 29, 2024 at 9:15 PM Thomas Munro <thomas(dot)munro(at)gmail(dot)com> wrote:

> On Mon, Sep 30, 2024 at 12:03 PM Craig Milhiser <craig(at)milhiser(dot)com>
> wrote:
> > I reproduced the issue on v17. I downloaded the source tarball, built
> it, passed tests, put my production database, analyzed and ran the query.
> As you expected, the same issue occurred. I have opened the incident with
> the AWS team as well.
>
> Since you're building from source, you could try applying the patch
> posted by Andrei Lephikov:
>
>
> https://www.postgresql.org/message-id/7d763a6d-fad7-49b6-beb0-86f99ce4a6eb%40postgrespro.ru
>
> I suspect we may want to limit it to a smaller number than that, as
> mentioned already, and I think we should also apply the same cap to
> the initial estimate (Andrei's patch only caps it when it decides to
> increase it, not for the initial nbatch number). I can write a patch
> like that in a few days when I return from travelling, and we can aim
> to get it into the November release, but I suspect Andrei's patch
> might already avoid the error for your case.
>

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Andrei Lepikhov 2024-10-02 00:35:04 Re: BUG #18643: EXPLAIN estimated rows mismatch
Previous Message David G. Johnston 2024-10-01 19:00:29 Re: Linux OOM killer