Re: [PATCH] Incremental sort (was: PoC: Partial sort)

From: James Coleman <jtc331(at)gmail(dot)com>
To: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Michael Paquier <michael(at)paquier(dot)xyz>, Rafia Sabih <rafia(dot)pghackers(at)gmail(dot)com>, Peter Geoghegan <pg(at)bowt(dot)ie>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Shaun Thomas <shaun(dot)thomas(at)2ndquadrant(dot)com>, Dmitry Dolgov <9erthalion6(at)gmail(dot)com>, Alexander Korotkov <a(dot)korotkov(at)postgrespro(dot)ru>, PostgreSQL Developers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Andreas Karlsson <andreas(at)proxel(dot)se>
Subject: Re: [PATCH] Incremental sort (was: PoC: Partial sort)
Date: 2020-04-01 00:11:15
Message-ID: CAAaqYe9Vr_Nq_v+qeGFAkd2RmRrLXaUbzreS5BzPV9YgwPLN-Q@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Mar 31, 2020 at 7:56 PM Tomas Vondra
<tomas(dot)vondra(at)2ndquadrant(dot)com> wrote:
>
> On Tue, Mar 31, 2020 at 07:09:04PM -0400, James Coleman wrote:
> >On Tue, Mar 31, 2020 at 6:54 PM Tomas Vondra
> ><tomas(dot)vondra(at)2ndquadrant(dot)com> wrote:
> >>
> >> On Tue, Mar 31, 2020 at 06:35:32PM -0400, Tom Lane wrote:
> >> >Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com> writes:
> >> >> In general, I think it'd be naive that we can make planner smarter with
> >> >> no extra overhead spent on planning, and we can never accept patches
> >> >> adding even tiny overhead. With that approach we'd probably end up with
> >> >> a trivial planner that generates just a single query plan, because
> >> >> that's going to be the fastest planner. A realistic approach needs to
> >> >> consider both the planning and execution phase, and benefits of this
> >> >> patch seem to be clear - if you have queries that do benefit from it.
> >> >
> >> >I think that's kind of attacking a straw man, though. The thing that
> >> >people push back on, or should push back on IMO, is when a proposed
> >> >patch adds significant slowdown to queries that it has no or very little
> >> >hope of improving. The trick is to do expensive stuff only when
> >> >there's a good chance of getting a better plan out of it.
> >> >
> >>
> >> Yeah, I agree with that. I think the main issue is that we don't really
> >> know what the "expensive stuff" is in this case, so it's not really
> >> clear how to be smarter :-(
> >
> >To add to this: I agree that ideally you'd check cheaply to know
> >you're in a situation that might help, and then do more work. But here
> >the question is always going to be simply "would we benefit from an
> >ordering, and, if so, do we have it already partially sorted". It's
> >hard to imagine that reducing much conceptually, so we're left with
> >optimizations of that check.
> >
>
> I think it depends on what exactly is the expensive part. For example if
> it's the construction of IncrementalSort paths, then maybe we could try
> do a quick/check check if the path can even be useful by estimating the
> cost and only then building the path.
>
> That's what we do for join algorithms, for example - we first compute
> initial_cost_nestloop and only when that seems cheap enough we do the
> more expensive stuff.
>
> But I'm not sure the path construction is the expensive part, as it
> should be disabled by enable_incrementalsort=off. But the regression
> does not seem to disappear, at least not entirely.
>
>
> >> One possibility is that it's just one of those regressions due to change
> >> in binary layout, but I'm not sure know how to verify that.
> >
> >If we are testing with a case that can't actually add more paths (due
> >to it checking the guc before building them), doesn't that effectively
> >leave one of these two options:
> >
> >1. Binary layout/cache/other untraceable change, or
> >2. Changes due to refactored function calls.
> >
>
> Hmm, but in case of (1) the overhead should be there even with tests
> that don't really have any additional paths to consider, right? I've
> tried with such test (single table with no indexes) and I don't quite
> see any regression (maybe ~1%).

Not necessarily, if the cost is in sort costing or useful pathkeys
checking, right? We have run that code even without incremental sort,
but it's changed from master.

> (2) might have impact, but I don't see any immediate supects. Did you
> have some functions in mind?

I guess this is where the lines blur: I didn't see anything obvious
either, but the changes to sort costing...should probably not have
real impact...but...

> BTW I see the patch adds pathkeys_common but it's not called from
> anywhere. It's probably leftover from an earlier patch version.
>
> >There's not anything obvious in point (2) that would be a big cost,
> >but there are definitely changes there. I was surprised that just
> >eliminating the loop through the pathkeys on the query and the index
> >was enough to save us ~4%.
> >
> >Tomas: Earlier you'd wondered about if we should try to shortcut the
> >changes in costing...I was skeptical of that originally, but maybe
> >it's worth looking into? I'm going to try backing that out and see
> >what the numbers look like.
> >

BTW, I did this test, and it looks like we can get back something
close to 1% by reverting that initial fix on partial path costing. But
we can't get rid of it all the time, at the very least. *Maybe* we
could condition it on incremental sort, but I'm not convinced that's
the only place it's needed as a fix.

> I've described the idea about something like initial_cost_nestloop and
> so on. But I'm a bit skeptical about it, considering that the GUC only
> has limited effect.
>
>
> A somewhat note is that the number of indexes has pretty significant
> impact on planning time, even on master. This is timing of the same
> explain script (similar to the one shown before) with different number
> of indexes on master:
>
> 0 indexes 7 indexes 49 indexes
> -------------------------------------------
> 10.85 12.56 27.83
>
> The way I look at incremental sort is that it allows using indexes for
> queries that couldn't use it before, possibly requiring a separate
> index. So incremental sort might easily reduce the number of indexes
> needed, compensating for the overhead we're discussing here. Of course,
> that may or may not be true.

One small idea, but I'm not yet sure it helps us a whole lot: if the
query pathkeys is only length 1, then we could skip the additional
path creation.

James

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Alvaro Herrera 2020-04-01 00:23:38 Re: [PATCH] Incremental sort (was: PoC: Partial sort)
Previous Message Tomas Vondra 2020-03-31 23:56:19 Re: [PATCH] Incremental sort (was: PoC: Partial sort)