Quick Links

Re: [PATCH] Incremental sort (was: PoC: Partial sort)

From:	James Coleman <jtc331(at)gmail(dot)com>
To:	Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
Cc:	Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Michael Paquier <michael(at)paquier(dot)xyz>, Rafia Sabih <rafia(dot)pghackers(at)gmail(dot)com>, Peter Geoghegan <pg(at)bowt(dot)ie>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Shaun Thomas <shaun(dot)thomas(at)2ndquadrant(dot)com>, Dmitry Dolgov <9erthalion6(at)gmail(dot)com>, Alexander Korotkov <a(dot)korotkov(at)postgrespro(dot)ru>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL Developers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Andreas Karlsson <andreas(at)proxel(dot)se>
Subject:	Re: [PATCH] Incremental sort (was: PoC: Partial sort)
Date:	2020-03-31 18:23:15
Message-ID:	CAAaqYe8M-r4esHOg_46w3ojeUEB7KGZ4j2GAi6rLv0B9bHoiPg@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

On Mon, Mar 30, 2020 at 9:14 PM Tomas Vondra
<tomas(dot)vondra(at)2ndquadrant(dot)com> wrote:
>
> The main thing I've been working on today is benchmarking how this
> affects planning. And I'm seeing a regression that worries me a bit,
> unfortunately.
>
> The test I'm doing is pretty simple - build a small table with a bunch
> of columns:
>
> create table t (a int, b int, c int, d int, e int, f int, g int);
>
> insert into t select 100*random(), 100*random(), 100*random(),
> 100*random(), 100*random(), 100*random(), 100*random()
> from generate_series(1,100000) s(i);
>
> and then a number of indexes on subsets of up to 3 columns, as generated
> using the attached build-indexes.py script. And then run a bunch of
> explains (so no actual execution) sorting the data by at least 4 columns
> (to trigger incremental sort paths), measuring timing of the script.
>
> I did a bunch of runs on current master and v46 with incremental sort
> disabled and enabled, and the results look like this:
>
> master off on
> --------------------------
> 34.609 37.463 37.729
>
> which means about 8-9% regression with incremental sort. Of course, this
> is only for planning time, for execution the impact is going to be much
> smaller. But it's still a bit annoying.
>
> I've suspected this might be either due to the add_partial_path changes
> or the patch adding incremental sort to additional places, so I tested
> those parts individually and the answer is no - add_partial_path changes
> have very small impact (~1%, which might be noise). The regression comes
> mostly from the 0002 part that adds incremental sort. At least in this
> particular test - different tests might behave differently, of course.
>
> The annoying bit is that the overhead does not disappear after disabling
> incremental sort. That suggests this is not merely due to considering
> and creating higher number of paths, but due to something that happens
> before we even look at the GUC ...
>
> I think I've found one such place - if you look at compare_pathkeys, it
> has this check right before the forboth() loop:
>
> if (keys1 == keys2)
> return PATHKEYS_EQUAL;
>
> But with incremental sort we don't call pathkeys_contained_in, we call
> pathkeys_common_contained_in instead. And that does not call
> compare_pathkeys and does not have the simple equality check. Adding
> the following check seems to cut the overhead in half, which is nice:
>
> if (keys1 == keys2)
> {
> *n_common = list_length(keys1);
> return true;
> }
>
> Not sure where the rest of the regression comes from yet.

I noticed in the other function we also optimize by checking if either
keys list is NIL, so I tried adding that, and it might have made a
minor difference, but it's hard to tell as it was under 1%.

I also ran perf with a slightly modified version of your test that
uses psql, and after the above changes was seeing something like a
3.5% delta between master and this patch series. Nothing obvious in
the perf report though.

This test is intended to be somewhat worst case, no? At what point do
we consider the trade-off worth it (given that it's not plausible to
have zero impact)?

> Also, while looking at pathkeys_common_contained_in(), I've been a bit
> puzzled why does is this correct:
>
> return (key1 == NULL);
>
> It's easy to not notice it's key1 and not keys1, so I suggest we add a
> comment 'key1 == NULL' means we've processed whole keys1 list.

Done.

I've included fixes for Alvaro's comments in this patch series also.

James

Attachment	Content-Type	Size
v48-0003-Consider-incremental-sort-paths-in-additional-pl.patch	text/x-patch	24.2 KB
v48-0005-typo.patch	text/x-patch	717 bytes
v48-0001-Consider-low-startup-cost-when-adding-partial-pa.patch	text/x-patch	4.6 KB
v48-0002-Implement-incremental-sort.patch	text/x-patch	167.5 KB
v48-0004-optimization.patch	text/x-patch	1.4 KB
v48-0006-enum-bitmask-style.patch	text/x-patch	951 bytes
v48-0007-bits32.patch	text/x-patch	1.3 KB

In response to

Re: [PATCH] Incremental sort (was: PoC: Partial sort) at 2020-03-31 01:14:07 from Tomas Vondra

Responses

Re: [PATCH] Incremental sort (was: PoC: Partial sort) at 2020-03-31 21:53:40 from Tomas Vondra

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Fujii Masao	2020-03-31 18:37:58	Re: recovery_target_action=pause with confusing hint
Previous Message	Bruce Momjian	2020-03-31 18:18:31	Re: Ecpg dependency