From: | John Naylor <johncnaylorls(at)gmail(dot)com> |
---|---|
To: | Nathan Bossart <nathandbossart(at)gmail(dot)com> |
Cc: | "Devulapalli, Raghuveer" <raghuveer(dot)devulapalli(at)intel(dot)com>, "pgsql-hackers(at)lists(dot)postgresql(dot)org" <pgsql-hackers(at)lists(dot)postgresql(dot)org>, "Shankaran, Akash" <akash(dot)shankaran(at)intel(dot)com> |
Subject: | Re: Improve CRC32C performance on SSE4.2 |
Date: | 2025-03-06 11:45:40 |
Message-ID: | CANWCAZbhL1EAjzZ7s_PBvmG2Mu=1bx1gvzbht5opitdV_Z48Rw@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Wed, Mar 5, 2025 at 10:52 PM Nathan Bossart <nathandbossart(at)gmail(dot)com> wrote:
>
> On Wed, Mar 05, 2025 at 08:51:21AM +0700, John Naylor wrote:
> > That was my hunch too, but I wanted to be more sure, so I modified the
> > benchmark so it doesn't know the address of the next calculation until
> > it finishes the last calculation so we can hopefully see the latency
> > caused by indirection. It also does an additional calculation on
> > constant 20 bytes, like the WAL header. I also tweaked the length each
> > iteration so the branch predictor maybe has a harder time predicting
> > the constant 20 input. And to make it more challenging, I removed the
> > part that inlined all small inputs, so it inlines only constant
> > inputs:
>
> Would you mind sharing this test?
The test script is the same as here, except I only ran small lengths:
...but I must have forgotten to attach the slightly tweaked patch set,
which I've done now. 0002 modifies the 0001 test module and 0006
reverts inlining non-constant input from 0005, just to see if I could
find a regression from indirection, which I didn't. If we don't need
it, it'd better to avoid inlining loops to keep from bloating the
binary.
> It sounds like you are running a
> workload with a mix of constant/inlined calls and function pointer calls to
> simulate typical usage for WAL, but I'm not 100% sure I'm understanding you
> correctly.
Exactly.
--
John Naylor
Amazon Web Services
Attachment | Content-Type | Size |
---|---|---|
v12-0006-Only-inline-for-constant-input-partial-revert.patch | text/x-patch | 1.0 KB |
v12-0004-Improve-CRC32C-performance-on-x86_64.patch | text/x-patch | 8.2 KB |
v12-0003-Inline-CRC-computation-for-small-fixed-length-in.patch | text/x-patch | 5.0 KB |
v12-0005-Use-runtime-check-even-when-we-have-SSE-4.2-at-c.patch | text/x-patch | 3.9 KB |
v12-0002-Attempt-to-make-benchmark-more-sensitive-to-late.patch | text/x-patch | 1.4 KB |
v12-0001-Add-a-Postgres-SQL-function-for-crc32c-benchmark.patch | text/x-patch | 6.4 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Hayato Kuroda (Fujitsu) | 2025-03-06 11:45:56 | RE: Selectively invalidate caches in pgoutput module |
Previous Message | Jakub Wartak | 2025-03-06 11:36:43 | Re: AIO v2.5 |