Re: POC: GROUP BY optimization

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Alexander Korotkov <aekorotkov(at)gmail(dot)com>, Andrei Lepikhov <a(dot)lepikhov(at)postgrespro(dot)ru>, PostgreSQL Developers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>, Teodor Sigaev <teodor(at)sigaev(dot)ru>, David Rowley <dgrowleyml(at)gmail(dot)com>, "a(dot)rybakina" <a(dot)rybakina(at)postgrespro(dot)ru>, Белялов Дамир Наилевич <d(dot)belyalov(at)postgrespro(dot)ru>
Subject: Re: POC: GROUP BY optimization
Date: 2024-01-26 15:38:23
Message-ID: 783171.1706283503@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Robert Haas <robertmhaas(at)gmail(dot)com> writes:
> On Tue, Dec 26, 2023 at 10:23 PM Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> I think it's a fool's errand to even try to separate different sort
>> column orderings by cost. We simply do not have sufficiently accurate
>> cost information. The previous patch in this thread got reverted because
>> of that (well, also some implementation issues, but mostly that), and
>> nothing has happened to make me think that another try will fare any
>> better.

> I'm late to the party, but I'd like to better understand what's being
> argued here.

What I am saying is that we don't have sufficiently accurate cost
information to support the sort of logic that got committed and
reverted before. I did not mean to imply that it's not possible
to have such info, only that it is not present today. IOW, what
I'm saying is that if you want to write code that tries to make
a cost-based preference of one sorting over another, you *first*
need to put in a bunch of legwork to create more accurate cost
numbers. Trying to make such logic depend on the numbers we have
today is just going to result in garbage in, garbage out.

Sadly, that's not a small task:

* We'd need to put effort into assigning more realistic procost
values --- preferably across the board, not just comparison functions.
As long as all the comparison functions have procost 1.0, you're
just flying blind.

* As you mentioned, there'd need to be some accounting for the
likely size of varlena inputs, and especially whether they might
be toasted.

* cost_sort knows nothing of the low-level sort algorithm improvements
we've made in recent years, such as abbreviated keys.

That's a lot of work, and I think it has to be done before we try
to build infrastructure on top, not afterwards.

regards, tom lane

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2024-01-26 15:41:05 Re: POC: GROUP BY optimization
Previous Message David G. Johnston 2024-01-26 15:20:01 Re: Add new COPY option REJECT_LIMIT