Re: "could not find pathkey item to sort" for TPC-DS queries 94-96

From: Zhihong Yu <zyu(at)yugabyte(dot)com>
To: Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>
Cc: James Coleman <jtc331(at)gmail(dot)com>, Luc Vlaming <luc(at)swarm64(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Robert Haas <robertmhaas(at)gmail(dot)com>
Subject: Re: "could not find pathkey item to sort" for TPC-DS queries 94-96
Date: 2021-04-16 02:27:17
Message-ID: CALNJ-vRKAJ8pneQuQaM7+UVw=CtOHrH9zX4EHaHtTQ-P_OKePg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Apr 15, 2021 at 6:27 PM Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>
wrote:

>
>
> On 4/15/21 7:35 PM, James Coleman wrote:
> > On Thu, Apr 15, 2021 at 5:33 AM Luc Vlaming <luc(at)swarm64(dot)com> wrote:
> >>
> >> On 15-04-2021 04:01, James Coleman wrote:
> >>> On Wed, Apr 14, 2021 at 5:42 PM James Coleman <jtc331(at)gmail(dot)com>
> wrote:
> >>>>
> >>>> On Mon, Apr 12, 2021 at 8:37 AM Tomas Vondra
> >>>> <tomas(dot)vondra(at)enterprisedb(dot)com> wrote:
> >>>>>
> >>>>> On 4/12/21 2:24 PM, Luc Vlaming wrote:
> >>>>>> Hi,
> >>>>>>
> >>>>>> When trying to run on master (but afaik also PG-13) TPC-DS queries
> 94,
> >>>>>> 95 and 96 on a SF10 I get the error "could not find pathkey item to
> sort".
> >>>>>> When I disable enable_gathermerge the problem goes away and then the
> >>>>>> plan for query 94 looks like below. I tried figuring out what the
> >>>>>> problem is but to be honest I would need some pointers as the code
> that
> >>>>>> tries to matching equivalence members in prepare_sort_from_pathkeys
> is
> >>>>>> something i'm really not familiar with.
> >>>>>>
> >>>>>
> >>>>> Could be related to incremental sort, which allowed some gather merge
> >>>>> paths that were impossible before. We had a couple issues related to
> >>>>> that fixed in November, IIRC.
> >>>>>
> >>>>>> To reproduce you can either ingest and test using the toolkit I
> used too
> >>>>>> (see https://github.com/swarm64/s64da-benchmark-toolkit/), or
> >>>>>> alternatively just use the schema (see
> >>>>>>
> https://github.com/swarm64/s64da-benchmark-toolkit/tree/master/benchmarks/tpcds/schemas/psql_native
> )
> >>>>>>
> >>>>>
> >>>>> Thanks, I'll see if I can reproduce that with your schema.
> >>>>>
> >>>>>
> >>>>> regards
> >>>>>
> >>>>> --
> >>>>> Tomas Vondra
> >>>>> EnterpriseDB: http://www.enterprisedb.com
> >>>>> The Enterprise PostgreSQL Company
> >>>>
> >>>> The query in question is:
> >>>>
> >>>> select count(*)
> >>>> from store_sales
> >>>> ,household_demographics
> >>>> ,time_dim, store
> >>>> where ss_sold_time_sk = time_dim.t_time_sk
> >>>> and ss_hdemo_sk = household_demographics.hd_demo_sk
> >>>> and ss_store_sk = s_store_sk
> >>>> and time_dim.t_hour = 15
> >>>> and time_dim.t_minute >= 30
> >>>> and household_demographics.hd_dep_count = 7
> >>>> and store.s_store_name = 'ese'
> >>>> order by count(*)
> >>>> limit 100;
> >>>>
> >>>> From debugging output it looks like this is the plan being chosen
> >>>> (cheapest total path):
> >>>> Gather(store_sales household_demographics time_dim)
> rows=60626
> >>>> cost=3145.73..699910.15
> >>>> HashJoin(store_sales household_demographics time_dim)
> >>>> rows=25261 cost=2145.73..692847.55
> >>>> clauses: store_sales.ss_hdemo_sk =
> >>>> household_demographics.hd_demo_sk
> >>>> HashJoin(store_sales time_dim) rows=252609
> >>>> cost=1989.73..692028.08
> >>>> clauses: store_sales.ss_sold_time_sk =
> >>>> time_dim.t_time_sk
> >>>> SeqScan(store_sales) rows=11998564
> >>>> cost=0.00..658540.64
> >>>> SeqScan(time_dim) rows=1070
> >>>> cost=0.00..1976.35
> >>>> SeqScan(household_demographics) rows=720
> >>>> cost=0.00..147.00
> >>>>
> >>>> prepare_sort_from_pathkeys fails to find a pathkey because
> >>>> tlist_member_ignore_relabel returns null -- which seemed weird because
> >>>> the sortexpr is an Aggref (in a single member equivalence class) and
> >>>> the tlist contains a single member that's also an Aggref. It turns out
> >>>> that the only difference between the two Aggrefs is that the tlist
> >>>> entry has "aggsplit = AGGSPLIT_INITIAL_SERIAL" while the sortexpr has
> >>>> aggsplit = AGGSPLIT_SIMPLE.
> >>>>
> >>>> That's as far as I've gotten so far, but I figured I'd get that info
> >>>> out to see if it means anything obvious to anyone else.
> >>>
> >>> This really goes back to [1] where we fixed a similar issue by making
> >>> find_em_expr_usable_for_sorting_rel parallel the rules in
> >>> prepare_sort_from_pathkeys.
> >>>
> >>> Most of those conditions got copied, and the case we were trying to
> >>> handle is the fact that prepare_sort_from_pathkeys can generate a
> >>> target list entry under those conditions if one doesn't exist. However
> >>> there's a further restriction there I don't remember looking at: it
> >>> uses pull_var_clause and tlist_member_ignore_relabel to ensure that
> >>> all of the vars that feed into the sort expression are found in the
> >>> target list. As I understand it, that is: it will build a target list
> >>> entry for something like "md5(column)" if "column" (and that was one
> >>> of our test cases for the previous fix) is in the target list already.
> >>>
> >>> But there's an additional detail here: the call to pull_var_clause
> >>> requests aggregates, window functions, and placeholders be treated as
> >>> vars. That means for our Aggref case it would require that the two
> >>> Aggrefs be fully equal, so the differing aggsplit member would cause a
> >>> target list entry not to be built, hence our error here.
> >>>
> >>> I've attached a quick and dirty patch that encodes that final rule
> >>> from prepare_sort_from_pathkeys into
> >>> find_em_expr_usable_for_sorting_rel. I can't help but think that
> >>> there's a cleaner way to do with this with less code duplication, but
> >>> hindering that is that prepare_sort_from_pathkeys is working with a
> >>> TargetList while find_em_expr_usable_for_sorting_rel is working with a
> >>> list of expressions.
> >>>
>
> Yeah, I think it'll be difficult to reuse code from later planner stages
> exactly because it operates on different representation. So something
> like your patch is likely necessary.
>
> As for the patch, I have a couple comments:
>
> 1) expr_list_member_ignore_relabel would deserve a better comment, and
> maybe a reference to tlist_member_ignore_relabel which it copies
>
> 2) I suppose the comment before "if (ec->ec_has_volatile)" needs
> updating, because now it says we're done as long as the expression is
> not volatile (but we're doing more stuff).
>
> 3) Shouldn't find_em_expr_usable_for_sorting_rel now mostly mimic what
> prepare_sort_from_pathkeys does? That is, try to match the entries
> directly first, before the new pull_vars() business?
>
> 4) I've simplified the foreach() loop a bit. prepare_sort_from_pathkeys
> does it differently, but that's because there are multiple foreach
> levels, I think. Yes, we'll not free the list, but I that's what most
> other places in planner do ...
>
> regards
>
> --
> Tomas Vondra
> EnterpriseDB: http://www.enterprisedb.com
> The Enterprise PostgreSQL Company
>

Hi,

if (!expr_list_member_ignore_relabel(lfirst(k), target->exprs))
- break;
+ return NULL;

I think it would be better if list_free(exprvars) is called before the
return.

Cheers

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Justin Pryzby 2021-04-16 02:52:18 Re: Replication slot stats misgivings
Previous Message Justin Pryzby 2021-04-16 02:07:38 Re: Converting built-in SQL functions to new style