From: | Robert Haas <robertmhaas(at)gmail(dot)com> |
---|---|
To: | Peter Geoghegan <pg(at)heroku(dot)com> |
Cc: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: cost_sort() may need to be updated |
Date: | 2016-09-13 12:59:57 |
Message-ID: | CA+TgmoZQK1fZ2o02uuCZAgEePvsycgOWk=0vTPrBNt75HvyhKw@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Sun, Sep 11, 2016 at 12:13 PM, Peter Geoghegan <pg(at)heroku(dot)com> wrote:
> On Sun, Sep 11, 2016 at 9:01 AM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> Peter Geoghegan <pg(at)heroku(dot)com> writes:
>>> I think that we *can* refine this guess, and should, because random
>>> I/O is really quite unlikely to be a large cost these days (I/O in
>>> general often isn't a large cost, actually). More fundamentally, I
>>> think it's a problem that cost_sort() thinks that external sorts are
>>> far more expensive than internal sorts in general. There is good
>>> reason to think that that does not reflect the reality. I think we can
>>> expect external sorts to be *faster* than internal sorts with
>>> increasing regularity in Postgres 10.
>>
>> TBH, if that's true, haven't you broken something?
>
> It's possible for external sorts to be faster some of the time because
> the memory access patterns can be more cache efficient: smaller runs
> are better when accessing tuples in sorted order, scattered across
> memory. More importantly, the sort can start returning tuples earlier
> in the common case where a final on-the-fly merge can be performed. In
> principle, you could adopt internal sorts to have the same advantages,
> but that hasn't and probably won't happen. Finally, the external sort
> I/O costs grow linearly, whereas the CPU costs grow in a linearithmic
> fashion, which will eventually come to dominate. We can hide the
> latency of those costs pretty well, too, with asynchronous I/O.
>
> I'm not arguing that cost_sort() should think that external sorts are
> cheaper under any circumstances, since all of this is very hard to
> model. I only mention this because it illustrates nicely that
> cost_sort() has the wrong idea.
I think that the only real way to judge whether cost_sort() is any
good is to see whether it causes the wrong plan to be chosen in some
cases. For example, it could cause merge joins to be picked too
frequently or too infrequently, or it could cause a wrong decision
between hash and group aggregates. Now, there is bound to be an
argument about whether any test cases that we might pick are
representative and the results will further differ between hardware
platforms, but I think changing the formula based on an abstract idea
about what's happening is probably not a good idea.
Because it's necessary to set work_mem so conservatively to avoid
running out of memory, most supposedly-external sorts aren't really
doing any I/O at all, and therefore arguing about whether we should be
charging seq_page_cost or random_page_cost for I/O is kinda missing
the point. Those numbers may just be reflecting other work that the
sort is doing that isn't being properly accounted for elsewhere, or
it's possible that the current costing of sorts is fairly far away
from reality. How would we know? The only thing that makes me think
they probably aren't *too* bad is that in a somewhat-systematic study
of query performance problems
(https://sites.google.com/site/robertmhaas/query-performance) done
several years ago I found no evidence of a problem with sort costing,
and my experience outside of that study bears that out. But that's
pretty back of the envelope.
Of course, the issue of supposed I/O actually being reads of cached
data is not unique to sorting. Sequential scans have the same
problem. We've previously discussed the idea of adding a
cached_page_cost, but this has fallen down over the issue that you
also need to know what percentage of the table is likely to be read
from cache, and you don't want that estimate to be too unstable or
your plans will bounce all over the place. One idea I had for dealing
with that is to just base it on the size of the table: assume small
tables are probably cached, and large tables probably aren't. The
latter is certainly true at least in the case where "large" means
"bigger than physical memory", and the former isn't a bad guess
either. So basically where this would go is to assume that scanning a
small table is cheaper per page than scanning a big one, which I
suspect is probably better than the uniform estimate we're using
today. However, I haven't got a single test case to prove it, and in
fact I'm not even sure how one could go about figuring out whether
this makes things better or worse in general.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
From | Date | Subject | |
---|---|---|---|
Next Message | Alvaro Herrera | 2016-09-13 13:00:32 | Re: An extra error for client disconnection on Windows |
Previous Message | Amit Kapila | 2016-09-13 12:21:41 | _hash_alloc_buckets() safety |