Re: Ordered Partitioned Table Scans

From: Simon Riggs <simon(at)2ndquadrant(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: David Rowley <david(dot)rowley(at)2ndquadrant(dot)com>, Julien Rouhaud <rjuju123(at)gmail(dot)com>, Antonin Houska <ah(at)cybertec(dot)at>, Amit Langote <Langote_Amit_f8(at)lab(dot)ntt(dot)co(dot)jp>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Ordered Partitioned Table Scans
Date: 2019-03-22 15:28:51
Message-ID: CANP8+j+mOKCsY2-A-pq7Dq8sFMNhwwYA9eKo8s8EadFi=s-MPA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, 22 Mar 2019 at 11:12, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:

> David Rowley <david(dot)rowley(at)2ndquadrant(dot)com> writes:
> > On Sat, 9 Mar 2019 at 10:52, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> >>> This can be a huge win for queries of the form "ORDER BY partkey LIMIT
> >>> x". Even if the first subpath(s) aren't natively ordered, not all of
> >>> the sorts should actually be performed.
>
> >> [ shrug... ] We've got no realistic chance of estimating such situations
> >> properly, so I'd have no confidence in a plan choice based on such a
> >> thing. Nor do I believe that this case is all that important.
>
> > Wondering if you can provide more details on why you think there's no
> > realistic chance of the planner costing these cases correctly?
>
> The reason why I'm skeptical about LIMIT with a plan of the form
> append-some-sorts-together is that there are going to be large
> discontinuities in the cost-vs-number-of-rows-returned graph,
> ie, every time you finish one child plan and start the next one,
> there'll be a hiccup while the sort happens. This is something
> that our plan cost approximation (startup cost followed by linear
> output up to total cost) can't even represent. Sticking a
> LIMIT on top of that is certainly not going to lead to any useful
> estimate of the actual cost, meaning that if you get a correct
> plan choice it'll just be by luck, and if you don't there'll be
> nothing to be done about it.
>
> If we don't incorporate that, then the situation that the planner
> will have to model is a MergeAppend with possibly some sorts in
> front of it, and it will correctly cost that as if all the sorts
> happen before any output occurs, and so you can hope that reasonable
> plan choices will ensue.
>
> I believe that the cases where a LIMIT query actually ought to go
> for a fast-start plan are where *all* the child rels have fast-start
> (non-sort) paths --- which is exactly the cases we'd get if we only
> allow "sorted" Appends when none of the inputs require a sort.
> Imagining that we'll get good results in cases where some of them
> need to be sorted is just wishful thinking.
>
> > It would be unfortunate to reject this patch based on a sentence that
> > starts with "[ shrug... ]", especially so when three people have stood
> > up and disagreed with you.
>
> I don't want to reject the patch altogether, I just want it to not
> include a special hack to allow Append rather than MergeAppend in such
> cases. I am aware that I'm probably going to be shouted down on this
> point, but that will not change my opinion that the shouters are wrong.
>

I agree that the issue of mixing sorts at various points will make nonsense
of the startup cost/total cost results.

What you say about LIMIT is exactly right. But ISTM that LIMIT itself is
the issue there and it need more smarts to correctly calculate costs.

I don't see LIMIT costing being broken as a reason to restrict this
optimization. I would ask that we allow improvements to the important use
case of ORDER BY/LIMIT, then spend time on making LIMIT work correctly.

--
Simon Riggs http://www.2ndQuadrant.com/
<http://www.2ndquadrant.com/>
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Christoph Berg 2019-03-22 15:30:12 Re: Offline enabling/disabling of data checksums
Previous Message Peter Eisentraut 2019-03-22 15:27:24 Re: [proposal] Add an option for returning SQLSTATE in psql error message