Re: Optimize WindowAgg's use of tuplestores

From: David Rowley <dgrowleyml(at)gmail(dot)com>
To: Ashutosh Bapat <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com>
Cc: David Rowley <dgrowley(at)gmail(dot)com>, pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: Optimize WindowAgg's use of tuplestores
Date: 2024-08-19 10:01:22
Message-ID: CAApHDvpU1n=rM-N-gZF3+im24qy7wTttm1mi23eoWVyitzPZYQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, 15 Jul 2024 at 18:02, Ashutosh Bapat
<ashutosh(dot)bapat(dot)oss(at)gmail(dot)com> wrote:
> There is significant improvement with a large number of partitions as
> seen previously. But for a smaller number of partitions the
> performance worsens, which needs some investigation.

We get performance variations all the time from unrelated changes due
to alignment changes in the code layout. There's a write-up in [1]
that you might find interesting. In particular the following
paragraph:

"Unfortunately, modern architectural features make this approach
unsound. Statistically sound evaluation requires multiple samples to
test whether one can or cannot (with high confidence) reject the null
hypothesis that results are the same before and after. However, caches
and branch predictors make performance dependent on machine-specific
parameters and the exact layout of code, stack frames, and heap
objects. A single binary constitutes just one sample from the space of
program layouts, regardless of the number of runs. Since compiler
optimizations and code changes also alter layout, it is currently
impossible to distinguish the impact of an optimization from that of
its layout effects."

Since the code changes here add no additional work, they only remove
work. I think any regressions you see must be related to code
alignment.

To try and move this forward again, I adjusted the patch to use a
static function with pg_noinline rather than unlikely. I don't think
this will make much difference code generation wise, but I did think
it was an improvement in code cleanliness. Patches attached.

I did a round of benchmarking on an AMD Zen4 7945hx and on an Apple
M2. I also graphed the results you sent so they're easier to compare
with mine.

0001 is effectively the unlikely() patch for calculating the frame offsets.
0002 is the tuplestore_reset() patch

The AMD laptop results were a bit noisy. M2 results were much more stable.

David

[1] https://emeryberger.com/research/stabilizer/

Attachment Content-Type Size
v3-0001-Speedup-WindowAgg-code-by-moving-uncommon-code-ou.patch application/octet-stream 5.4 KB
v3-0002-Optimize-WindowAgg-s-use-of-tuplestores.patch application/octet-stream 8.4 KB
image/png 72.8 KB
amd7945hx_windowagg.png image/png 75.5 KB
Ashutosh_windowagg.png image/png 76.0 KB

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Tomas Vondra 2024-08-19 10:16:16 Re: Drop database command will raise "wrong tuple length" if pg_database tuple contains toast attribute.
Previous Message Peter Eisentraut 2024-08-19 09:43:06 Re: thread-safety: gmtime_r(), localtime_r()