From: | Robert Haas <robertmhaas(at)gmail(dot)com> |
---|---|
To: | torikoshia <torikoshia(at)oss(dot)nttdata(dot)com> |
Cc: | Rafael Thofehrn Castro <rafaelthca(at)gmail(dot)com>, Andrei Lepikhov <lepihov(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Proposal: Progressive explain |
Date: | 2025-04-01 14:37:58 |
Message-ID: | CA+Tgmobd5RH156_QEmTqHtgbtqDy=G2WJ1aJ0m-BOW4qqWVAfA@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Tue, Apr 1, 2025 at 9:38 AM torikoshia <torikoshia(at)oss(dot)nttdata(dot)com> wrote:
> Could you please check if you can reproduce this?
I can, and I now see that this patch has a pretty big design problem.
The assertion failure occurs when a background worker tries to call
ruleutils.c's get_parameter(), which tries to find the expression from
which the parameter was computed. To do that, it essentially looks
upward in the plan tree until it finds the source of the parameter.
For example, if the parameter came from the bar_id column of relation
foo, we would then deparse the parameter as foo.bar_id, rather than as
$1 or similar. However, this doesn't work when the deparsing is
attempted from within a parallel worker, because the parallel worker
only gets the portion of the plan it's attempting to execute, not the
whole thing. In your test case, the whole plan looks like this:
Aggregate
InitPlan 1
-> Result
InitPlan 2
-> Result
-> Gather
Workers Planned: 2
-> Parallel Append
-> Parallel Seq Scan on ab_a1_b2 ab_1
Filter: ((b = 2) AND ((a = (InitPlan 1).col1) OR
(a = (InitPlan 2).col1)))
-> Parallel Seq Scan on ab_a2_b2 ab_2
Filter: ((b = 2) AND ((a = (InitPlan 1).col1) OR
(a = (InitPlan 2).col1)))
-> Parallel Seq Scan on ab_a3_b2 ab_3
Filter: ((b = 2) AND ((a = (InitPlan 1).col1) OR
(a = (InitPlan 2).col1)))
Those references to (InitPlan 1).col1 are actually Params. I think
what's happening here is approximately that the worker tries to find
the source of those Params, but (InitPlan 1) is above the Gather node
and thus not available to the worker, and so the worker can't find it
and the assertion fails.
In one sense, it is not hard to fix this: the workers shouldn't really
be doing progressive_explain at all, because then we get a
progressive_explain from each process individually instead of one for
the query as a whole, so we could just think of having the workers
ignore the progressive_explain GUC. However, one thing I realized
earlier this morning is that the progressive EXPLAIN can't show any of
the instrumentation that is relayed back from workers to the leader
only at the end of the execution. See the code in ParallelQueryMain()
just after ExecutorFinish().
What I think this means is that this patch needs significant
rethinking to cope with parallel query. I don't think users are going
to be happy with a progressive EXPLAIN that just ignores what the
workers are doing, and I don't think they're going to be happy with N
separate progressive explains that they have to merge together to get
an overall picture of what the query is doing, and I'm absolutely
positive that they're not going to be happy with something that
crashes. I think there may be a way to come up with a good design here
that avoids these problems, but we definitely don't have time before
feature freeze (not that we were necessarily in a great place to think
of committing this before feature freeze anyway, but it definitely
doesn't make sense now that I understand this problem).
--
Robert Haas
EDB: http://www.enterprisedb.com
From | Date | Subject | |
---|---|---|---|
Next Message | Matthias van de Meent | 2025-04-01 14:39:47 | Re: Adding skip scan (including MDAM style range skip scan) to nbtree |
Previous Message | Peter Geoghegan | 2025-04-01 14:37:50 | Re: Adding skip scan (including MDAM style range skip scan) to nbtree |