Re: POC: GROUP BY optimization

From: Richard Guo <guofenglinux(at)gmail(dot)com>
To: Andrei Lepikhov <a(dot)lepikhov(at)postgrespro(dot)ru>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Alexander Korotkov <aekorotkov(at)gmail(dot)com>, Pavel Borisov <pashkin(dot)elfe(at)gmail(dot)com>, vignesh C <vignesh21(at)gmail(dot)com>, PostgreSQL Developers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>, Teodor Sigaev <teodor(at)sigaev(dot)ru>, David Rowley <dgrowleyml(at)gmail(dot)com>, "a(dot)rybakina" <a(dot)rybakina(at)postgrespro(dot)ru>
Subject: Re: POC: GROUP BY optimization
Date: 2024-02-21 08:08:12
Message-ID: CAMbWs49oMEFUEy6RRaenx5ak4AuZzuDEDzi0otEBYwJzBBYz7Q@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Feb 2, 2024 at 12:40 PM Andrei Lepikhov <a(dot)lepikhov(at)postgrespro(dot)ru>
wrote:

> On 2/2/2024 11:06, Richard Guo wrote:
> >
> > On Fri, Feb 2, 2024 at 11:32 AM Richard Guo <guofenglinux(at)gmail(dot)com
> > <mailto:guofenglinux(at)gmail(dot)com>> wrote:
> >
> > On Fri, Feb 2, 2024 at 10:02 AM Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us
> > <mailto:tgl(at)sss(dot)pgh(dot)pa(dot)us>> wrote:
> >
> > One of the test cases added by this commit has not been very
> > stable in the buildfarm. Latest example is here:
> >
> >
> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=prion&dt=2024-02-01%2021%3A28%3A04
> <
> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=prion&dt=2024-02-01%2021%3A28%3A04
> >
> >
> > and I've seen similar failures intermittently on other machines.
> >
> > I'd suggest building this test atop a table that is more stable
> > than pg_class. You're just waving a red flag in front of a bull
> > if you expect stable statistics from that during a regression
> run.
> > Nor do I see any particular reason for pg_class to be especially
> > suited to the test.
> >
> >
> > Yeah, it's not a good practice to use pg_class in this place. While
> > looking through the test cases added by this commit, I noticed some
> > other minor issues that are not great. Such as
> >
> > * The table 'btg' is inserted with 10000 tuples, which seems a bit
> > expensive for a test. I don't think we need such a big table to test
> > what we want.
> >
> > * I don't see why we need to manipulate GUC max_parallel_workers and
> > max_parallel_workers_per_gather.
> >
> > * I think we'd better write the tests with the keywords being all
> upper
> > or all lower. A mixed use of upper and lower is not great. Such as
> in
> >
> > explain (COSTS OFF) SELECT x,y FROM btg GROUP BY x,y,z,w;
> >
> > * Some comments for the test queries are not easy to read.
> >
> > * For this statement
> >
> > CREATE INDEX idx_y_x_z ON btg(y,x,w);
> >
> > I think the index name would cause confusion. It creates an index on
> > columns y, x and w, but the name indicates an index on y, x and z.
> >
> > I'd like to write a draft patch for the fixes.
> >
> >
> > Here is the draft patch that fixes the issues I complained about in
> > upthread.
>

> I passed through the patch. Looks like it doesn't break anything. Why do
> you prefer to use count(*) in EXPLAIN instead of plain targetlist, like
> "SELECT x,y,..."?

Nothing special. Just making the test cases consistent as much as
possible.

> Also, according to the test mentioned by Tom:
> 1. I see, PG uses IndexScan on (x,y). So, column x will be already
> sorted before the MergeJoin. Why not use Incremental Sort on (x,z,w)
> instead of full sort?

I think that's because the planner chooses to use (z, w, x) to perform
the mergejoin. I did not delve into the details, but I guess the cost
estimation decides this is cheaper.

Hi Alexander,

What do you think about the revisions for the test cases?

Thanks
Richard

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Michail Nikolaev 2024-02-21 08:35:40 Re: Revisiting {CREATE INDEX, REINDEX} CONCURRENTLY improvements
Previous Message Bertrand Drouvot 2024-02-21 08:07:47 Re: Injection points: some tools to wait and wake