Quick Links

Re: cost_sort() may need to be updated

From:	Peter Geoghegan <pg(at)heroku(dot)com>
To:	Robert Haas <robertmhaas(at)gmail(dot)com>
Cc:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: cost_sort() may need to be updated
Date:	2016-09-13 16:27:24
Message-ID:	CAM3SWZQxyB6CPu8qsPAuYoEJmb=0EdM5sqRpsVYW4Hp_LgdNUQ@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

On Tue, Sep 13, 2016 at 5:59 AM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> I think that the only real way to judge whether cost_sort() is any
> good is to see whether it causes the wrong plan to be chosen in some
> cases. For example, it could cause merge joins to be picked too
> frequently or too infrequently, or it could cause a wrong decision
> between hash and group aggregates. Now, there is bound to be an
> argument about whether any test cases that we might pick are
> representative and the results will further differ between hardware
> platforms, but I think changing the formula based on an abstract idea
> about what's happening is probably not a good idea.

I don't disagree with any of this. And, I don't pretend to have the
best understanding of how real world observations on the quality of
plans can make its way back into the query plan, and be integrated,
without regressing too many other things.

The basis of my complaint is that cost_sort() typically costs external
sorts as twice or more as expensive as internal sorts, and that that
does not seem at all consistent with my *recent* observations, without
even bringing caching effects into it. That seems safe enough for
someone that is relatively unfamiliar with the optimizer to have as a
general concern, because external sorts and internal sorts really
aren't so different these days.

> Because it's necessary to set work_mem so conservatively to avoid
> running out of memory, most supposedly-external sorts aren't really
> doing any I/O at all, and therefore arguing about whether we should be
> charging seq_page_cost or random_page_cost for I/O is kinda missing
> the point.

I don't take issue with how internal sorts are costed at all. I think
that the factors that direct the tuning of work_mem might prominently
include the anti-scaling effect of replacement selection. To the DBA,
that can just look like simple memory pressure, perhaps. Time will
tell how true this is.

> Those numbers may just be reflecting other work that the
> sort is doing that isn't being properly accounted for elsewhere, or
> it's possible that the current costing of sorts is fairly far away
> from reality. How would we know? The only thing that makes me think
> they probably aren't *too* bad is that in a somewhat-systematic study
> of query performance problems
> (https://sites.google.com/site/robertmhaas/query-performance) done
> several years ago I found no evidence of a problem with sort costing,
> and my experience outside of that study bears that out. But that's
> pretty back of the envelope.

It's also out of date. As I said, we don't have any real field
experience with the 9.6 external sort changes yet, but we'll have some
soon enough.

I'm happy to leave it at that, for now -- maybe someone will recall
this thread at a later date, when there is a complaint about a
suboptimal query plan from a user.

--
Peter Geoghegan

In response to

Re: cost_sort() may need to be updated at 2016-09-13 12:59:57 from Robert Haas

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Jeff Janes	2016-09-13 16:31:22	Re: Hash Indexes
Previous Message	Andres Freund	2016-09-13 16:25:05	Re: Implement targetlist SRFs using ROWS FROM() (was Changed SRF in targetlist handling)