Re: Improve statistics estimation considering GROUP-BY as a 'uniqueiser'

From: Andrei Lepikhov <lepihov(at)gmail(dot)com>
To: Alexander Korotkov <aekorotkov(at)gmail(dot)com>
Cc: Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Subject: Re: Improve statistics estimation considering GROUP-BY as a 'uniqueiser'
Date: 2025-02-18 12:52:15
Message-ID: b2e53e9d-366b-435b-a5dc-7c516f5dc65d@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 17/2/2025 02:06, Alexander Korotkov wrote:
> On Thu, Nov 28, 2024 at 4:39 AM Andrei Lepikhov <lepihov(at)gmail(dot)com> wrote:
>> Here we also could count number of scanned NULLs separately in
>> vardata_extra and use it in upper GROUP-BY estimation.
>
> What could be the type of vardata_extra? And what information could
> it store? Yet seems too sketchy for me to understand.
It is actually sketchy. Our estimation routines have no information
about intermediate modifications of the data. Left-join generated NULLs
is a good example here. So, my vague idea is to maintain that info and
change statistical estimations somehow.
Of course, it is out of the scope here.
>
> But, I think for now we should go with the original patch. It seems
> to be quite straightforward extension to what 4767bc8ff2 does. I've
> revised commit message and applied pg_indent to sources. I'm going to
> push this if no objections.
Ok, I added one regression test to check that feature works properly.

--
regards, Andrei Lepikhov

Attachment Content-Type Size
v3-0001-Improve-statistics-estimation-for-single-column-G.patch text/plain 8.3 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Ranier Vilela 2025-02-18 12:58:46 Improve cleaning files on Postgres crashes
Previous Message Ranier Vilela 2025-02-18 12:48:07 Fix api misuse (src/bin/pg_amcheck/pg_amcheck.c)