From: | John Naylor <johncnaylorls(at)gmail(dot)com> |
---|---|
To: | David Rowley <dgrowleyml(at)gmail(dot)com> |
Cc: | Andy Fan <zhihuifan1213(at)163(dot)com>, PostgreSQL Developers <pgsql-hackers(at)lists(dot)postgresql(dot)org> |
Subject: | Re: Make tuple deformation faster |
Date: | 2024-07-25 03:18:15 |
Message-ID: | CANWCAZZe63DHpCEttKKf-sgj7726QtE0Vwm4jCX42a9x1oJ+=g@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Mon, Jul 1, 2024 at 5:07 PM David Rowley <dgrowleyml(at)gmail(dot)com> wrote:
> cycles idle
> 8505168 stalled-cycles-backend:u # 0.02% backend cycles idle
> 165442142326 instructions:u # 3.35 insn per cycle
> # 0.00 stalled
> cycles per insn
> 39409877343 branches:u # 3.945 G/sec
> 146350275 branch-misses:u # 0.37% of all branches
> patched
> cycles idle
> 24259785 stalled-cycles-backend:u # 0.05% backend cycles idle
> 213688149862 instructions:u # 4.29 insn per cycle
> # 0.00 stalled
> cycles per insn
> 44147675129 branches:u # 4.420 G/sec
> 14282567 branch-misses:u # 0.03% of all branches
> You can see the branch predictor has done a *much* better job in the
> patched code vs master with about 10x fewer misses. This should have
Nice!
> helped contribute to the "insn per cycle" increase. 4.29 is quite
> good for postgres. I often see that around 0.5. According to [1]
> (relating to Zen4), "We get a ridiculous 12 NOPs per cycle out of the
> micro-op cache". I'm unsure how micro-ops translate to "insn per
> cycle" that's shown in perf stat. I thought 4-5 was about the maximum
> pipeline size from today's era of CPUs.
"ins per cycle" is micro-ops retired (i.e. excludes those executed
speculatively on a mispredicted branch).
That article mentions that 6 micro-ops per cycle can enter the backend
from the frontend, but that can happen only with internally cached
ops, since only 4 instructions per cycle can be decoded. In specific
cases, CPUs can fuse multiple front-end instructions into a single
macro-op, which I think means a pair of micro-ops that can "travel
together" as one. The authors concluded further down that "Zen 4’s
reorder buffer is also special, because each entry can hold up to 4
NOPs. Pairs of NOPs are likely fused by the decoders, and pairs of
fused NOPs are fused again at the rename stage."
From | Date | Subject | |
---|---|---|---|
Next Message | vignesh C | 2024-07-25 03:22:26 | Re: Logical Replication of sequences |
Previous Message | Nathan Bossart | 2024-07-25 03:16:51 | Re: pg_upgrade and logical replication |