Re: POC: GROUP BY optimization

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Alexander Korotkov <aekorotkov(at)gmail(dot)com>, Andrei Lepikhov <a(dot)lepikhov(at)postgrespro(dot)ru>, PostgreSQL Developers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>, Teodor Sigaev <teodor(at)sigaev(dot)ru>, David Rowley <dgrowleyml(at)gmail(dot)com>, "a(dot)rybakina" <a(dot)rybakina(at)postgrespro(dot)ru>, Белялов Дамир Наилевич <d(dot)belyalov(at)postgrespro(dot)ru>
Subject: Re: POC: GROUP BY optimization
Date: 2024-01-26 15:09:04
Message-ID: CA+TgmoaAGPSzcetRQyBywVk=qSJjBpta5=Qz-1j2XJ5BkgkAVA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Dec 26, 2023 at 10:23 PM Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> I think it's a fool's errand to even try to separate different sort
> column orderings by cost. We simply do not have sufficiently accurate
> cost information. The previous patch in this thread got reverted because
> of that (well, also some implementation issues, but mostly that), and
> nothing has happened to make me think that another try will fare any
> better.

I'm late to the party, but I'd like to better understand what's being
argued here. If you're saying that, for some particular planner
problem, we should prefer a solution that doesn't need to know about
the relative cost of various sorts over one that does, I agree, for
exactly the reason that you state: our knowledge of sort costs won't
be reliable, and we will make mistakes. That's true in lots of
situations, not just related to sorts,
because estimation is a hard problem. Heuristics not based on cost are
going to be, in many cases, more accurate than heuristics based on
cost. They're also often cheaper, since they often let us reject
possible approaches very early, without all the bother of a cost
comparison.

But if you're saying that it's utterly impossible to know whether
sorting text will be cheaper or more expensive than sorting 4-byte
integers, and that if a particular problem can be solved only by
knowing which one is cheaper we should just give up, then I disagree.
In the absence of any other information, it must be right, at the very
least, to bank on varlena data types being more expensive to sort than
fixed-length data types. How much more expensive is hard to know,
because toasted blobs are going to be more expensive to sort than
short varlenas. But even before you reach the comparison function, a
pass-by-value datum has a significantly lower access cost than a
pass-by-reference datum. The fact that the pass-by-reference value
might be huge only compounds the problem.

--
Robert Haas
EDB: http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message vignesh C 2024-01-26 15:12:22 Re: Memory consumed by child SpecialJoinInfo in partitionwise join planning
Previous Message David G. Johnston 2024-01-26 15:08:29 Change COPY ... ON_ERROR ignore to ON_ERROR ignore_row