Re: BUG #17564: Planner bug in combination of generate_series(), unnest() and ORDER BY

From: Richard Guo <guofenglinux(at)gmail(dot)com>
To: Martijn van Oosterhout <kleptog(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL mailing lists <pgsql-bugs(at)lists(dot)postgresql(dot)org>
Subject: Re: BUG #17564: Planner bug in combination of generate_series(), unnest() and ORDER BY
Date: 2022-08-03 07:45:19
Message-ID: CAMbWs49WiiONtw7-zje2O31GOwZ4XxZNNub=Y_ChA4ecC9x3yQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

On Tue, Aug 2, 2022 at 7:28 PM Richard Guo <guofenglinux(at)gmail(dot)com> wrote:

>
> On Tue, Aug 2, 2022 at 4:50 PM Martijn van Oosterhout <kleptog(at)gmail(dot)com>
> wrote:
>
>> Now it's morning I've thought of a way to reproduce it more easily, see
>> the attached script. The tricky part is getting the tuples in a position
>> that triggers the planner in the right way. So the script includes a list
>> of (ctid, primary key) and creates a table using that with quite a large
>> amount of bloat. It then creates some constraints, vacuums and runs the
>> offending query. On my system it reproduces with 100% reliability (so far
>> anyway).
>>
>
> Thanks for the report! I can reproduce it on HEAD. The part of the plan
> that causes problem looks like:
>
> -> Gather Merge
> Output: results.group_id
> Workers Planned: 1
> -> Sort
> Output: results.group_id, ((unnest('{}'::jsonb[]) ->>
> 'timestamp'::text))
> Sort Key: ((unnest('{}'::jsonb[]) ->> 'timestamp'::text))
> -> Parallel Seq Scan on public.results
> Output: results.group_id, (unnest('{}'::jsonb[]) ->>
> 'timestamp'::text)
> Filter: (results.search_id = 3336)
>
> Please note that the expression below appears in the targetlist of
> seqscan:
>
> unnest('{}'::jsonb[]) ->> 'timestamp'::text
>
> The func for operator '->>' itself is not proretset, but one of its args
> (the unnest func) is proretset. And that causes problem when we set up
> projection info for the seqscan node.
>
> So why does this expression appear in the targetlist of seqscan node? I
> did some debug on that. Since this expression is the desired pathkey of
> the query, relation_can_be_sorted_early() checks on it and finds that it
> can be computed from the reltarget of rel 'results', which is true since
> this expression can be computed all by itself. So it is considered as a
> useful ordering for rel 'results' and generate_useful_gather_paths()
> creates the Sort and then Gather Merge nodes for 'results' based on this
> pathkey.
>
> Still need more time to investigate for the fix.
>

In relation_can_be_sorted_early(), we try to find an EC member that
matches some reltarget or can be computed from the reltarget. If we find
out such an EC member, we check whether it involves set-returning
functions (and reject it if so) with IS_SRF_CALL, which only tests
funcretset/opretset flag of the expression, without checking further
into its args. I'm wondering if this is enough.

I'm considering a fix as checking the EC member expression recursively
with expression_returns_set(), something like below:

diff --git a/src/backend/optimizer/path/equivclass.c
b/src/backend/optimizer/path/equivclass.c
index 60c0e3f108..7991295548 100644
--- a/src/backend/optimizer/path/equivclass.c
+++ b/src/backend/optimizer/path/equivclass.c
@@ -986,7 +986,7 @@ relation_can_be_sorted_early(PlannerInfo *root,
RelOptInfo *rel,
* one are effectively checking properties of targetexpr,
so there's
* no point in asking whether some other EC member would be
better.)
*/
- if (IS_SRF_CALL((Node *) em->em_expr))
+ if (expression_returns_set((Node *) em->em_expr))
continue;

/*
@@ -1014,7 +1014,7 @@ relation_can_be_sorted_early(PlannerInfo *root,
RelOptInfo *rel,
* member in this case; since SRFs can't appear in WHERE, they
cannot
* belong to multi-member ECs.)
*/
- if (IS_SRF_CALL((Node *) em->em_expr))
+ if (expression_returns_set((Node *) em->em_expr))
return false;

return true;

Does this make any sense?

Thanks
Richard

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message PG Bug reporting form 2022-08-03 10:09:07 BUG #17567: Unable to Set Max_Connection in Postgresql which has replicas
Previous Message 王海洋 2022-08-03 06:37:30 [PATCH] BUG FIX: inconsistent page found in BRIN_REGULAR_PAGE