Re: Optimize WindowAgg's use of tuplestores

From: David Rowley <dgrowleyml(at)gmail(dot)com>
To: Andy Fan <zhihuifan1213(at)163(dot)com>
Cc: Ashutosh Bapat <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com>, pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: Optimize WindowAgg's use of tuplestores
Date: 2024-07-18 11:49:54
Message-ID: CAApHDvpJyvaKpye+5CzE8=22zJ5UhEynCo-KY8-gteSHctMvyA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, 18 Jul 2024 at 19:56, Andy Fan <zhihuifan1213(at)163(dot)com> wrote:
> I'm curious about why a 'unlikey' change can cause noticeable change,
> especially there is just one function call in the 'if-statement' (I am
> thinking more instrucments in the if-statement body, more changes it can
> cause).
>
> + if (unlikely(winstate->buffer == NULL))
> + prepare_tuplestore(winstate);

This isn't the branch being discussed. We've not talked about whether
the above one is useful or not. The branch we were discussing is the
"if (winstate->all_first)", of which has a large number of
instructions inside it.

However, for this one, my understanding of this particular case is,
it's a very predictable branch as, even if the first prediction gets
it wrong the first time, it's not going to be wrong for long as the
condition is false for all subsequent calls. So, from there, if you
assume that the instruction decoder is always going to fetch the
correct instructions according to the branch predictor's correct
prediction (the ones for branch not taken), then the only benefit
possible is that the next instructions to execute are the next
instructions in the code rather than instructions located far away,
possibly on another cacheline that needs to be loaded from RAM or
higher cache levels. Adding the unlikely() should coax the compiler
into laying out the code so the branch's instructions at the end of
the function and that, in theory, should reduce the likelihood of
frontend stalls waiting for cachelines for further instructions to
arrive from RAM or higher cache levels. That's my theory, at least. I
expected to see perf stat show me a lower "stalled-cycles-frontend"
with the v2 patch than v1, but didn't see that when I looked, so my
theory might be wrong.

For the case you're mentioning above, calling the function isn't very
many instructions, so the benefits are likely very small with a branch
this predictable.

David

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Joe Conway 2024-07-18 11:55:28 Re: CI, macports, darwin version problems
Previous Message Amit Kapila 2024-07-18 11:42:46 Re: Slow catchup of 2PC (twophase) transactions on replica in LR