From: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
---|---|
To: | Nathan Bossart <nathandbossart(at)gmail(dot)com> |
Cc: | Jeff Davis <pgsql(at)j-davis(dot)com>, pgsql-hackers(at)postgresql(dot)org, Xiang(dot)Gao(at)arm(dot)com |
Subject: | Re: always use runtime checks for CRC-32C instructions |
Date: | 2023-10-31 19:16:16 |
Message-ID: | 2613682.1698779776@sss.pgh.pa.us |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Nathan Bossart <nathandbossart(at)gmail(dot)com> writes:
> On Mon, Oct 30, 2023 at 10:36:01PM -0500, Nathan Bossart wrote:
>> I tested pg_waldump -z with 50M 65-byte records for the following
>> implementations on an ARM system:
>>
>> * slicing-by-8 : ~3.08s
>> * proposed patches applied (runtime check) : ~2.44s
>> * only CRC intrinsics implementation compiled : ~2.42s
>> * forced inlining : ~2.38s
>>
>> Avoiding the runtime check produced a 0.8% improvement, and forced inlining
>> produced another 1.7% improvement. In comparison, even the runtime check
>> implementation produced a 20.8% improvement over the slicing-by-8 one.
I find these numbers fairly concerning. If you can see a
couple-of-percent slowdown on a macroscopic benchmark like pg_waldump,
that implies that the percentage slowdown considering the CRC
operation alone is much worse. So there may be other use-cases where
we would take a bigger relative hit.
> * From my quick scan of a few dozen machines on the buildfarm, it looks
> like the runtime checks are already the norm, so the number of systems
> that would be subject to a regression from v16 to v17 should be pretty
> small, in theory. And this regression seems to be on the order of 1%
> based on the numbers above.
I did a more thorough scrape of the buildfarm results. Of 161 animals
currently reporting configure output on HEAD, we have
2 ARMv8 CRC instructions
36 ARMv8 CRC instructions with runtime check
2 LoongArch CRCC instructions
2 SSE 4.2
52 SSE 4.2 with runtime check
67 slicing-by-8
While that'd seem to support your conclusion, the two using ARM CRC
*without* a runtime check are my Apple M1 Mac animals (sifaka/indri);
and I see the same selection on my laptop. So one platform where
we'd clearly be taking a regression is M-series Macs; that's a pretty
popular platform. The two using SSE without a check are prion and
tayra. I notice those are using gcc 11; so perhaps the default cflags
have changed to include -msse4.2 recently? I couldn't see much other
pattern though. (Scraping results attached in case anybody wants to
look.)
Really this just reinforces my concern that doing a runtime check
all the time is on the wrong side of history. I grant that we've
got to do that for anything where the availability of the instruction
is really in serious question, but I'm not very convinced that that's
a majority situation on popular platforms.
regards, tom lane
Attachment | Content-Type | Size |
---|---|---|
results.csv | text/plain | 13.3 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Tom Lane | 2023-10-31 19:42:33 | Re: always use runtime checks for CRC-32C instructions |
Previous Message | Michael Banck | 2023-10-31 19:01:38 | Re: [patch] pg_basebackup: mention that spread checkpoints are the default in --help |