Re: [PATCH] Incremental sort (was: PoC: Partial sort)

From: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
To: James Coleman <jtc331(at)gmail(dot)com>
Cc: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Peter Geoghegan <pg(at)bowt(dot)ie>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Rafia Sabih <rafia(dot)pghackers(at)gmail(dot)com>, Shaun Thomas <shaun(dot)thomas(at)2ndquadrant(dot)com>, Dmitry Dolgov <9erthalion6(at)gmail(dot)com>, Alexander Korotkov <a(dot)korotkov(at)postgrespro(dot)ru>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL Developers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: [PATCH] Incremental sort (was: PoC: Partial sort)
Date: 2019-09-28 22:55:19
Message-ID: 20190928225519.mnhuhccw5a4ymnae@development
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Sep 27, 2019 at 08:31:30PM -0400, James Coleman wrote:
>On Fri, Sep 13, 2019 at 10:51 AM Tomas Vondra
><tomas(dot)vondra(at)2ndquadrant(dot)com> wrote:
>>
>> On Thu, Sep 12, 2019 at 12:49:29PM -0300, Alvaro Herrera wrote:
>> >On 2019-Jul-30, Tomas Vondra wrote:
>> >> I've decided to do a couple of experiments, trying to make my mind about
>> >> which modified places matter to diffrent queries. But instead of trying
>> >> to reverse engineer the queries, I've taken a different approach - I've
>> >> compiled a list of queries that I think are sensible and relevant, and
>> >> then planned them with incremental sort enabled in different places.
>> >
>> >[...]
>> >
>> >> The list of queries (synthetic, but hopefully sufficiently realistic)
>> >> and a couple of scripts to collect the plans is in this repository:
>> >>
>> >> https://github.com/tvondra/incremental-sort-tests-2
>> >>
>> >> There's also a spreadsheet with a summary of results, with a visual
>> >> representation of which GUCs affect which queries.
>> >
>> >OK, so we have that now. I suppose this spreadsheet now tells us which
>> >places are useful and which aren't, at least for the queries that you've
>> >tested. Dowe that mean that we want to get the patch to consider adding
>> >paths only the places that your spreadsheet says are useful? I'm not
>> >sure what the next steps are for this patch.
>> >
>>
>> Yes. I think the spreadsheet call help us with answering two things:
>>
>> 1) places actually affecting the plan (all but three do)
>>
>> 2) redundant places (there are some cases where two GUCs produce the
>> same plan in the end)
>
>To expand on this further, (1) should probably help us to be able to
>write test cases.
>
>Additionally, one big thing we still need that's somewhat external to
>the patch is a good way to benchmark/a set of queries that we believe
>are representative enough to be good benchmarks.
>
>I'd really appreciate some input from you all on that particular
>question; I feel like it's in some sense the biggest barrier to
>getting the patch merged, but also the part where long experience in
>the community/exposure to other use cases will probably be quite
>valuable.
>

Hmmm. I don't think anyone will hand us a set of representative queries,
so I think we have two options:

1) Generate synthetic queries covering a wide range of cases (both when
incremental sort is expected to help and not). I think the script I've
used to determine which places do matter can be the starting point.

2) Look at some established benchmarks and see if some of the queries
could benefit from the incremental sort (possibly with some changes to
indexes in the usual schema). I plan to look at TPC-H / TPC-DS, but I
wonder if some OLTP benchmarks would be relevant too.

>> Of course, this does assume the query set makes sense and is somewhat
>> realistic, but I've tried to construct queries where that is true. We
>> may extend it over time, of course.
>>
>> I think we've agreed to add incremental sort paths different places in
>> separate patches, to make review easier. So this may be a useful way to
>> decide which places to address first. I'd probably do it in this order:
>>
>> - create_ordered_paths
>> - create_ordered_paths (parallel part)
>> - add_paths_to_grouping_rel
>> - ... not sure ...
>>
>> but that's just a proposal. It'd give us most of the benefits, I think,
>> and we could also focus on the rest of the patch.
>
>Certainly the first two seem like pretty obvious most necessary base
>cases. I think supporting group bys also seems like a pretty standard
>case, so at first glance I'd say this seems like a reasonable course
>to me.
>

OK.

>I'm going to start breaking up the patches in this thread into a
>series in support of that. Since I've started a new thread with the
>add_partial_path change, I'll include that patch here as part of this
>series also. Do you think it's worth moving the tuplesort changes into
>a standalone patch in the series also?
>

Probably. I'd do that at least for the review.

>Attached is a rebased v31 now broken into the following:
>
>- 001-consider-startup-cost-in-add-partial-path_v1.patch: From the
>other thread (Tomas's patch unmodified)
>- 002-incremental-sort_v31.patch: Updated base incremental sort patch
>
>Besides rebasing, I've changed the enable_incrementalsort GUC to
>prevent generating paths entirely rather than being cost-based, since
>incremental sort is never absolutely necessary in the way regular sort
>is.
>

OK, makes sense.

>I'm hoping to add 003 soon with the initial parallel parts, but I'm
>about out of time right now and wanted to get something out, so
>sending this without that.
>
>Side question: for the patch tester do I have to attach each part of
>the series each time even if nothing's changed in several of them? And
>does the vN number at the end need to stay the same for all of them?
>My attachments to this email don't follow that... Also, since this
>email changes patch naming, so I need to do anything to clear out the
>old ones? (I suppose if not, then that would imply an answer to the
>first question also.)
>

Please always send the whole patch series. Firstly, that's the only way
how the cfbot can apply and test the patches (it can't collect patches
scattered in different messages in the thread). Secondly, it's really
annoying for the reviewers to try to pick the matching bits.

Also, it's a good ide to always mark all parts with the same version
info, not v1 for one part and v31 for another one. I'd simply do
something like

git format-patch --suffix=-YYYYMMDD.patch master

or something like that.

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Tomas Vondra 2019-09-28 23:00:49 Re: [PATCH] Incremental sort (was: PoC: Partial sort)
Previous Message Tomas Vondra 2019-09-28 22:37:41 Re: Consider low startup cost in add_partial_path