Re: Improve CRC32C performance on SSE4.2

From: John Naylor <johncnaylorls(at)gmail(dot)com>
To: Nathan Bossart <nathandbossart(at)gmail(dot)com>
Cc: "Devulapalli, Raghuveer" <raghuveer(dot)devulapalli(at)intel(dot)com>, "pgsql-hackers(at)lists(dot)postgresql(dot)org" <pgsql-hackers(at)lists(dot)postgresql(dot)org>, "Shankaran, Akash" <akash(dot)shankaran(at)intel(dot)com>
Subject: Re: Improve CRC32C performance on SSE4.2
Date: 2025-03-06 11:45:40
Message-ID: CANWCAZbhL1EAjzZ7s_PBvmG2Mu=1bx1gvzbht5opitdV_Z48Rw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Mar 5, 2025 at 10:52 PM Nathan Bossart <nathandbossart(at)gmail(dot)com> wrote:
>
> On Wed, Mar 05, 2025 at 08:51:21AM +0700, John Naylor wrote:
> > That was my hunch too, but I wanted to be more sure, so I modified the
> > benchmark so it doesn't know the address of the next calculation until
> > it finishes the last calculation so we can hopefully see the latency
> > caused by indirection. It also does an additional calculation on
> > constant 20 bytes, like the WAL header. I also tweaked the length each
> > iteration so the branch predictor maybe has a harder time predicting
> > the constant 20 input. And to make it more challenging, I removed the
> > part that inlined all small inputs, so it inlines only constant
> > inputs:
>
> Would you mind sharing this test?

The test script is the same as here, except I only ran small lengths:

https://www.postgresql.org/message-id/CANWCAZahvhE-%2BhtZiUyzPiS5e45ukx5877mD-dHr-KSX6LcdjQ%40mail.gmail.com

...but I must have forgotten to attach the slightly tweaked patch set,
which I've done now. 0002 modifies the 0001 test module and 0006
reverts inlining non-constant input from 0005, just to see if I could
find a regression from indirection, which I didn't. If we don't need
it, it'd better to avoid inlining loops to keep from bloating the
binary.

> It sounds like you are running a
> workload with a mix of constant/inlined calls and function pointer calls to
> simulate typical usage for WAL, but I'm not 100% sure I'm understanding you
> correctly.

Exactly.

--
John Naylor
Amazon Web Services

Attachment Content-Type Size
v12-0006-Only-inline-for-constant-input-partial-revert.patch text/x-patch 1.0 KB
v12-0004-Improve-CRC32C-performance-on-x86_64.patch text/x-patch 8.2 KB
v12-0003-Inline-CRC-computation-for-small-fixed-length-in.patch text/x-patch 5.0 KB
v12-0005-Use-runtime-check-even-when-we-have-SSE-4.2-at-c.patch text/x-patch 3.9 KB
v12-0002-Attempt-to-make-benchmark-more-sensitive-to-late.patch text/x-patch 1.4 KB
v12-0001-Add-a-Postgres-SQL-function-for-crc32c-benchmark.patch text/x-patch 6.4 KB

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Hayato Kuroda (Fujitsu) 2025-03-06 11:45:56 RE: Selectively invalidate caches in pgoutput module
Previous Message Jakub Wartak 2025-03-06 11:36:43 Re: AIO v2.5