From: | Ashutosh Bapat <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com> |
---|---|
To: | David Rowley <dgrowleyml(at)gmail(dot)com> |
Cc: | David Rowley <dgrowley(at)gmail(dot)com>, pgsql-hackers(at)lists(dot)postgresql(dot)org |
Subject: | Re: Optimize WindowAgg's use of tuplestores |
Date: | 2024-09-06 06:30:33 |
Message-ID: | CAExHW5uTr2PUhinHqD=ajmwEqVaiG5n82bVJYz0AvL6WfEkMJg@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Wed, Sep 4, 2024 at 8:20 PM David Rowley <dgrowleyml(at)gmail(dot)com> wrote:
>
> On Mon, 19 Aug 2024 at 22:01, David Rowley <dgrowleyml(at)gmail(dot)com> wrote:
> > To try and move this forward again, I adjusted the patch to use a
> > static function with pg_noinline rather than unlikely. I don't think
> > this will make much difference code generation wise, but I did think
> > it was an improvement in code cleanliness. Patches attached.
> >
> > I did a round of benchmarking on an AMD Zen4 7945hx and on an Apple
> > M2. I also graphed the results you sent so they're easier to compare
> > with mine.
> >
> > 0001 is effectively the unlikely() patch for calculating the frame offsets.
> > 0002 is the tuplestore_reset() patch
>
> I was experimenting with this again. The 0002 patch added a
> next_partition field to the WindowAggState struct and caused the
> struct to become slightly bigger. I've now included a 0003 patch
> which shifts some fields around in that struct so as to keep it the
> same size as it is on master. Benchmarking with that removes that very
> tiny performance regression.
If patches are applied in the same sequence as yours, the size of
WindowAggState struct goes from 632 to 640 and then back to 632 on my
laptop. That looks a tiny but nice improvement by itself.
If the patches are applied in the order 0001, 0003 and 0002, the size
of the structure remains 632 throughout. Patch 0003 does not affect
the size of the structure by itself.
> I also
> tested this on an AMD 3990x machine along with fresh results from the
> AMD 7945hx laptop. Both of those machines come out faster on all tests
> when comparing master to all 3 patches. With the Apple M2, there does
> not seem to be much change in performance with the tests containing
> fewer rows per partition, some are faster, some are slower, all within
> typical noise fluctuations.
I have similar observations as yours on my amd64 laptop. I also
verified that 0003 by itself is not effective. This indicates that the
(atleast some of the) regression caused by 0002 comes from larger
structure. Why would that happen?
>
> Given the performance now seems improved in all cases, I plan on
> pushing this change as a single commit.
>
Agreed. I will review the code in detail by next week.
--
Best Wishes,
Ashutosh Bapat
From | Date | Subject | |
---|---|---|---|
Next Message | Ashutosh Bapat | 2024-09-06 07:07:55 | Re: Add memory/disk usage for WindowAgg nodes in EXPLAIN |
Previous Message | Tatsuo Ishii | 2024-09-06 06:02:37 | Re: Add memory/disk usage for WindowAgg nodes in EXPLAIN |