From: | Arne Roland <A(dot)Roland(at)index(dot)de> |
---|---|
To: | Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, Sand Stone <sand(dot)m(dot)stone(at)gmail(dot)com> |
Cc: | Rick Otten <rottenwindfish(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, "pgsql-performance(at)lists(dot)postgresql(dot)org" <pgsql-performance(at)lists(dot)postgresql(dot)org>, Robert Haas <robertmhaas(at)gmail(dot)com> |
Subject: | RE: dsa_allocate() faliure |
Date: | 2019-01-24 14:44:41 |
Message-ID: | 6f3fe9fa5a984dc19e40e79fbef45edc@index.de |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers pgsql-performance |
Hello,
I'm not sure whether this is connected at all, but I'm facing the same error with a generated query on postgres 10.6.
It works with parallel query disabled and gives "dsa_allocate could not find 7 free pages" otherwise.
I've attached query and strace. The table is partitioned on (o, date). It's not depended on the precise lists I'm using, while it obviously does depend on the fact that the optimizer chooses a parallel query.
Regards
Arne Roland
-----Original Message-----
From: Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>
Sent: Friday, October 5, 2018 4:17 AM
To: Sand Stone <sand(dot)m(dot)stone(at)gmail(dot)com>
Cc: Rick Otten <rottenwindfish(at)gmail(dot)com>; Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>; pgsql-performance(at)lists(dot)postgresql(dot)org; Robert Haas <robertmhaas(at)gmail(dot)com>
Subject: Re: dsa_allocate() faliure
On Wed, Aug 29, 2018 at 5:48 PM Sand Stone <sand(dot)m(dot)stone(at)gmail(dot)com> wrote:
> I attached a query (and its query plan) that caused the crash: "dsa_allocate could not find 13 free pages" on one of the worker nodes. I anonymised the query text a bit. Interestingly, this time only one (same one) of the nodes is crashing. Since this is a production environment, I cannot get the stack trace. Once turned off parallel execution for this node. The whole query finished just fine. So the parallel query plan is from one of the nodes not crashed, hopefully the same plan would have been executed on the crashed node. In theory, every worker node has the same bits, and very similar data.
I wonder if this was a different symptom of the problem fixed here:
https://www.postgresql.org/message-id/flat/194c0706-c65b-7d81-ab32-2c248c3e2344%402ndquadrant.com
Can you still reproduce it on current master, REL_11_STABLE or REL_10_STABLE?
--
Thomas Munro
http://www.enterprisedb.com
Attachment | Content-Type | Size |
---|---|---|
strace.log | application/octet-stream | 435.5 KB |
query.sql | application/octet-stream | 141.1 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Bruce Momjian | 2019-01-24 14:47:32 | Re: Protect syscache from bloating with negative cache entries |
Previous Message | Tom Lane | 2019-01-24 14:37:41 | Re: Use an enum for RELKIND_*? |
From | Date | Subject | |
---|---|---|---|
Next Message | Jan Nielsen | 2019-01-24 16:52:03 | Re: SELECT performance drop |
Previous Message | Mariel Cherkassky | 2019-01-24 14:14:21 | Re: ERROR: found xmin from before relfrozenxid |