Re: enable_incremental_sort changes query behavior

From: James Coleman <jtc331(at)gmail(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Jaime Casanova <jaime(dot)casanova(at)2ndquadrant(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
Subject: Re: enable_incremental_sort changes query behavior
Date: 2020-10-07 20:01:27
Message-ID: CAAaqYe_Oe=tRBhJKBbxz3bPygQ-5RpgcF8XO-UQzV1_Fu_w+9g@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Oct 7, 2020 at 3:52 PM Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
>
> On Thu, Oct 1, 2020 at 9:03 AM James Coleman <jtc331(at)gmail(dot)com> wrote:
> > The plan (obtained by replacing the volatile function with a stable one):
> >
> > Unique
> > -> Nested Loop
> > -> Gather Merge
> > Workers Planned: 2
> > -> Sort
> > Sort Key: ref_0.i, (md5(ref_0.t))
> > -> Parallel Index Scan using ref_0_i_idx on ref_0
> > -> Function Scan on generate_series ref_1
> >
> > Changing `md5(ref_0.t)` to `random()::text || ref_0.t` causes the error.
>
> As far as I remember this stuff, target lists for any nodes below the
> top of the join tree were previously always just Var nodes. Any
> expressions that needed to be computed were computed at the level of
> the topmost join, before adding upper planner steps like Agg,
> WindowAgg, Limit, etc. But, here you've got a complex expression --
> md5(ref_0.t) -- begin computed somewhere in the middle of the plan
> tree so that the sort can be performed at that point. However, that
> seems likely to break a bunch of assumptions all over the planner,
> because, unless a lot of work and surgery has been done since I looked
> at this last, that md5(ref_0.t) value is not going to "bubble up" to
> higher levels of the plan through the tlists of the individual nodes.
> That poses a risk that it will be recomputed multiple times, which is
> particularly bad for volatile functions because you might not always
> get the same answer, but it seems like it might be bad even for
> non-volatile functions, because it could be slow if the value is used
> at multiple places in the query. It feels like the sort of thing that
> might have broken some other assumptions, too, although I can't say
> exactly what.

The distinction between volatile and stable expressions here has been
discussed at length in this thread; I think it'd be worth reading
through the rest of the thread before offering sledgehammer ideas
below like reverting the entire piece.

In particular, most of the discussion in the thread has been about
what is actually safe to reference at lower levels in the plan. It
would be extremely easy tomake this only allow sorting by Vars at
lower levels in the plan, though the current revision of the patch
follows what prepare_sort_from_pathkeys allows and disallows.

Obviously it'd be nice to be able to sort by non-Var expressions at
lower levels too. But again, it'd be easy to change that if it's not
safe, so it'd be helpful if you could point to areas that don't bubble
non-Vars up so that we can actually comment about that in the source
and discuss specifics.

> I wonder if whatever part of the incremental sort patch allowed
> computation of tlist expressions below the top level of the join tree
> ought to just be reverted outright, but that might be overkill. I
> can't say that I really understand exactly what's going on here, but
> it seems like a pretty significant behavior change to make as part of
> another project, and I wonder if we're going to keep finding other
> problems with it.

James

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message David Rowley 2020-10-07 20:15:12 Re: Recent failures on buildfarm member hornet
Previous Message Robert Haas 2020-10-07 19:52:04 Re: enable_incremental_sort changes query behavior