From: | Craig Milhiser <craig(at)milhiser(dot)com> |
---|---|
To: | Thomas Munro <thomas(dot)munro(at)gmail(dot)com> |
Cc: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-bugs(at)lists(dot)postgresql(dot)org |
Subject: | Re: Reference to - BUG #18349: ERROR: invalid DSA memory alloc request size 1811939328, CONTEXT: parallel worker |
Date: | 2024-10-02 00:12:41 |
Message-ID: | CA+wnhO3kT7mxbadbteppSZVcs3FRPvGL5xTY2DbrWSGohmhyVw@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-bugs |
> Since you're building from source, you could try applying the patch
>posted by Andrei Lephikov:
>
https://www.postgresql.org/message-id/7d763a6d-fad7-49b6-beb0-86f99ce4a6eb%40postgrespro.ru
This did not work for me. I am running out of memory.
I applied the patch, make clean, make, make check, sudo make install. I am
running out of the box Postgres configuration.
Memory below uses "free -m".
Before loading Postgres
total used free shared buff/cache
available
Mem: 31388 669 30467 2 639
30719
Swap: 0 0 0
After loading
total used free shared buff/cache
available
Mem: 31388 672 30464 14 651
30715
Swap: 0 0 0
I go into psql
set max_parallel_workers_per_gather = 0;
run the query multiple times, takes 9.5 seconds at steady state, returns 20
rows.
Memory is still available
total used free shared buff/cache
available
Mem: 31388 921 22547 142 8460
30466
Swap: 0 0 0
In the same psql session, set max_parallel_workers_per_gather = 2; then run
the query again. This runs for 1 minute then:
2024-10-01 18:28:45.883 UTC [2586] LOG: background worker "parallel
worker" (PID 4465) was terminated by signal 9: Killed
2024-10-01 18:28:45.883 UTC [2586] DETAIL: Failed process was running:
SELECT
...
2024-10-01 18:28:45.883 UTC [2586] LOG: terminating any other active
server processes
2024-10-01 18:28:46.620 UTC [2586] LOG: all server processes terminated;
reinitializing
I got this as close to the end as I could
total used free shared buff/cache
available
Mem: 31388 31014 535 1955 2156
373
Swap: 0 0 0
Though OOM conditions often means all bets are off for behavior, I tried
something different. I rebooted, started Postgres then run the query. I do
not set parallel_... = 0 and run the query which populated the cache. The
machine exhausts memory again but usually "hangs". I need to restart.
Below is the frozen screen
total used free shared buff/cache
available
Mem: 31388 31317 240 1955 2140
70
Swap: 0 0 0
I ran these sequences multiple times. I also analyzed the data again just
to make sure.
I reverted the patch to make sure I am reproducing the issue. I get the
same 1.8GB allocation failure with parallel. Without parallel the query
takes ~10 seconds. The patch increased the single worker performance for
this query for out of the box configuration by 5%.
Thanks
On Sun, Sep 29, 2024 at 9:15 PM Thomas Munro <thomas(dot)munro(at)gmail(dot)com> wrote:
> On Mon, Sep 30, 2024 at 12:03 PM Craig Milhiser <craig(at)milhiser(dot)com>
> wrote:
> > I reproduced the issue on v17. I downloaded the source tarball, built
> it, passed tests, put my production database, analyzed and ran the query.
> As you expected, the same issue occurred. I have opened the incident with
> the AWS team as well.
>
> Since you're building from source, you could try applying the patch
> posted by Andrei Lephikov:
>
>
> https://www.postgresql.org/message-id/7d763a6d-fad7-49b6-beb0-86f99ce4a6eb%40postgrespro.ru
>
> I suspect we may want to limit it to a smaller number than that, as
> mentioned already, and I think we should also apply the same cap to
> the initial estimate (Andrei's patch only caps it when it decides to
> increase it, not for the initial nbatch number). I can write a patch
> like that in a few days when I return from travelling, and we can aim
> to get it into the November release, but I suspect Andrei's patch
> might already avoid the error for your case.
>
From | Date | Subject | |
---|---|---|---|
Next Message | Andrei Lepikhov | 2024-10-02 00:35:04 | Re: BUG #18643: EXPLAIN estimated rows mismatch |
Previous Message | David G. Johnston | 2024-10-01 19:00:29 | Re: Linux OOM killer |