Re: POC: GROUP BY optimization

From: Andrei Lepikhov <a(dot)lepikhov(at)postgrespro(dot)ru>
To: Richard Guo <guofenglinux(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Alexander Korotkov <aekorotkov(at)gmail(dot)com>, Pavel Borisov <pashkin(dot)elfe(at)gmail(dot)com>, vignesh C <vignesh21(at)gmail(dot)com>, PostgreSQL Developers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>, Teodor Sigaev <teodor(at)sigaev(dot)ru>, David Rowley <dgrowleyml(at)gmail(dot)com>, "a(dot)rybakina" <a(dot)rybakina(at)postgrespro(dot)ru>
Subject: Re: POC: GROUP BY optimization
Date: 2024-02-02 04:40:12
Message-ID: f5dae142-d351-4991-a2aa-92d12dcea004@postgrespro.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 2/2/2024 11:06, Richard Guo wrote:
>
> On Fri, Feb 2, 2024 at 11:32 AM Richard Guo <guofenglinux(at)gmail(dot)com
> <mailto:guofenglinux(at)gmail(dot)com>> wrote:
>
> On Fri, Feb 2, 2024 at 10:02 AM Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us
> <mailto:tgl(at)sss(dot)pgh(dot)pa(dot)us>> wrote:
>
> One of the test cases added by this commit has not been very
> stable in the buildfarm.  Latest example is here:
>
> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=prion&dt=2024-02-01%2021%3A28%3A04 <https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=prion&dt=2024-02-01%2021%3A28%3A04>
>
> and I've seen similar failures intermittently on other machines.
>
> I'd suggest building this test atop a table that is more stable
> than pg_class.  You're just waving a red flag in front of a bull
> if you expect stable statistics from that during a regression run.
> Nor do I see any particular reason for pg_class to be especially
> suited to the test.
>
>
> Yeah, it's not a good practice to use pg_class in this place.  While
> looking through the test cases added by this commit, I noticed some
> other minor issues that are not great.  Such as
>
> * The table 'btg' is inserted with 10000 tuples, which seems a bit
> expensive for a test.  I don't think we need such a big table to test
> what we want.
>
> * I don't see why we need to manipulate GUC max_parallel_workers and
> max_parallel_workers_per_gather.
>
> * I think we'd better write the tests with the keywords being all upper
> or all lower.  A mixed use of upper and lower is not great. Such as in
>
>     explain (COSTS OFF) SELECT x,y FROM btg GROUP BY x,y,z,w;
>
> * Some comments for the test queries are not easy to read.
>
> * For this statement
>
>     CREATE INDEX idx_y_x_z ON btg(y,x,w);
>
> I think the index name would cause confusion.  It creates an index on
> columns y, x and w, but the name indicates an index on y, x and z.
>
> I'd like to write a draft patch for the fixes.
>
>
> Here is the draft patch that fixes the issues I complained about in
> upthread.
I passed through the patch. Looks like it doesn't break anything. Why do
you prefer to use count(*) in EXPLAIN instead of plain targetlist, like
"SELECT x,y,..."?
Also, according to the test mentioned by Tom:
1. I see, PG uses IndexScan on (x,y). So, column x will be already
sorted before the MergeJoin. Why not use Incremental Sort on (x,z,w)
instead of full sort?
2. For memo, IMO, this test shows us the future near perspective of this
feature: It is cheaper to use grouping order (w,z) instead of (z,w).

--
regards,
Andrei Lepikhov
Postgres Professional

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Kapila 2024-02-02 04:58:05 Re: Synchronizing slots from primary to standby
Previous Message Peter Smith 2024-02-02 04:20:17 Re: Synchronizing slots from primary to standby