Re: Add ExprState hashing for GROUP BY and hashed SubPlans

From: Andrei Lepikhov <lepihov(at)gmail(dot)com>
To: David Rowley <dgrowleyml(at)gmail(dot)com>
Cc: PostgreSQL Developers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Add ExprState hashing for GROUP BY and hashed SubPlans
Date: 2024-10-31 02:30:06
Message-ID: 6201c452-3506-4389-918c-a65568557e3f@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 10/31/24 08:16, David Rowley wrote:
> On Tue, 29 Oct 2024 at 22:47, David Rowley <dgrowleyml(at)gmail(dot)com> wrote:
>> I've attached an updated patch with a few other fixes. Whilr checking
>> this tonight, noticed that master does not use
>> SubPlanState.tab_eq_funcs for anything. I resisted removing that in
>> this patch. Perhaps a follow-on patch can remove that. I suspect it's
>> not been used for a long time now, but I didn't do the archaeology
>> work to find out.
>
> 3974bc319 removed the SubPlanState.tab_eq_funcs field, so here's a
> rebased patch.
Thanks for sharing this.
I still need to dive deeply into the code. But I have one annoying user
case where the user complained about a 4x SQL server speedup in
comparison to Postgres, and I guess it is a good benchmark for your code.
This query is remarkable because of high grouping computation load. Of
course, I can't provide the user's data, but I have prepared a synthetic
test to reproduce the case (see attachment).
Comparing the master with and without your patch, the first, I see is
more extensive usage of memory (see complete explains in the attachment):

Current master:
---------------

Partial HashAggregate (cost=54492.60..55588.03 rows=19917 width=889)
(actual time=20621.028..20642.664 rows=10176 loops=9)
Group Key: t1.x1, t1.x2, t1.x3, t1.x4, t1.x5
Batches: 1 Memory Usage: 74513kB

Patched:
--------

Partial HashAggregate (cost=54699.91..55799.69 rows=19996 width=889)
(actual time=57213.280..186216.604 rows=10302738 loops=9)
Group Key: t1.x1, t1.x2, t1.x3, t1.x4, t1.x5
Batches: 261 Memory Usage: 527905kB Disk Usage: 4832656kB

I wonder what causes memory consumption, but it is hard to decide on the
patch's positive outcome for now.

--
regards, Andrei Lepikhov

Attachment Content-Type Size
bench_results.txt text/plain 4.5 KB
synth.sql application/sql 1.8 KB

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Smith 2024-10-31 02:38:58 Re: Pgoutput not capturing the generated columns
Previous Message vignesh C 2024-10-31 02:13:28 Re: Pgoutput not capturing the generated columns