From: | David Rowley <dgrowleyml(at)gmail(dot)com> |
---|---|
To: | Ranier Vilela <ranier(dot)vf(at)gmail(dot)com> |
Cc: | Peter Eisentraut <peter(dot)eisentraut(at)enterprisedb(dot)com>, Justin Pryzby <pryzby(at)telsasoft(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Avoid unecessary MemSet call (src/backend/utils/cache/relcache.c) |
Date: | 2022-05-18 22:57:02 |
Message-ID: | CAApHDvruee_36_fWSjeCkXiUg04FJKQBVBZZpF9rY7qEdchNPA@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Thu, 19 May 2022 at 02:08, Ranier Vilela <ranier(dot)vf(at)gmail(dot)com> wrote:
> That would initialize the content at compilation and not at runtime, correct?
Your mental model of compilation and run-time might be flawed here.
Here's no such thing as zeroing memory at compile time. There's only
emitting instructions that perform those tasks at run-time.
https://godbolt.org/ might help your understanding.
> There are a lot of cases using MemSet (with struct variables) and at Windows 64 bits, long are 4 (four) bytes.
> So I believe that MemSet is less efficient on Windows than on Linux.
> "The size of the '_vstart' buffer is not a multiple of the element size of the type 'long'."
> message from PVS-Studio static analysis tool.
I've been wondering for a while if we really need to have the MemSet()
macro. I see it was added in 8cb415449 (1997). I think compilers have
evolved quite a bit in the past 25 years, so it could be time to
revisit that.
Your comment on the sizeof(long) on win64 is certainly true. I wrote
the attached C program to test the performance difference.
(windows 64-bit)
>cl memset.c /Ox
>memset 200000000
Running 200000000 loops
MemSet: size 8: 1.833000 seconds
MemSet: size 16: 1.841000 seconds
MemSet: size 32: 1.838000 seconds
MemSet: size 64: 1.851000 seconds
MemSet: size 128: 3.228000 seconds
MemSet: size 256: 5.278000 seconds
MemSet: size 512: 3.943000 seconds
memset: size 8: 0.065000 seconds
memset: size 16: 0.131000 seconds
memset: size 32: 0.262000 seconds
memset: size 64: 0.530000 seconds
memset: size 128: 1.169000 seconds
memset: size 256: 2.950000 seconds
memset: size 512: 3.191000 seconds
It seems like there's no cases there where MemSet is faster than
memset. I was careful to only provide MemSet() with inputs that
result in it not using the memset fallback. I also provided constants
so that the decision about which method to use was known at compile
time.
It's not clear to me why 512 is faster than 256. I saw the same on a repeat run.
Changing "long" to "long long" it looks like:
>memset 200000000
Running 200000000 loops
MemSet: size 8: 0.066000 seconds
MemSet: size 16: 1.978000 seconds
MemSet: size 32: 1.982000 seconds
MemSet: size 64: 1.973000 seconds
MemSet: size 128: 1.970000 seconds
MemSet: size 256: 3.225000 seconds
MemSet: size 512: 5.366000 seconds
memset: size 8: 0.069000 seconds
memset: size 16: 0.132000 seconds
memset: size 32: 0.265000 seconds
memset: size 64: 0.527000 seconds
memset: size 128: 1.161000 seconds
memset: size 256: 2.976000 seconds
memset: size 512: 3.179000 seconds
The situation is a little different on my Linux machine:
$ gcc memset.c -o memset -O2
$ ./memset 200000000
Running 200000000 loops
MemSet: size 8: 0.000002 seconds
MemSet: size 16: 0.000000 seconds
MemSet: size 32: 0.094041 seconds
MemSet: size 64: 0.184618 seconds
MemSet: size 128: 1.781503 seconds
MemSet: size 256: 2.547910 seconds
MemSet: size 512: 4.005173 seconds
memset: size 8: 0.046156 seconds
memset: size 16: 0.046123 seconds
memset: size 32: 0.092291 seconds
memset: size 64: 0.184509 seconds
memset: size 128: 1.781518 seconds
memset: size 256: 2.577104 seconds
memset: size 512: 4.004757 seconds
It looks like part of the work might be getting optimised away in the
8-16 MemSet() calls.
clang seems to have the opposite for size 8.
$ clang memset.c -o memset -O2
$ ./memset 200000000
Running 200000000 loops
MemSet: size 8: 0.007653 seconds
MemSet: size 16: 0.005771 seconds
MemSet: size 32: 0.011539 seconds
MemSet: size 64: 0.023095 seconds
MemSet: size 128: 0.046130 seconds
MemSet: size 256: 0.092269 seconds
MemSet: size 512: 0.968564 seconds
memset: size 8: 0.000000 seconds
memset: size 16: 0.005776 seconds
memset: size 32: 0.011559 seconds
memset: size 64: 0.023069 seconds
memset: size 128: 0.046129 seconds
memset: size 256: 0.092243 seconds
memset: size 512: 0.968534 seconds
There does not seem to be any significant reduction in the size of the
binary from changing the MemSet macro to directly use memset. It went
from 9865008 bytes down to 9860800 bytes (4208 bytes less).
David
Attachment | Content-Type | Size |
---|---|---|
memset.c | text/plain | 2.1 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Zhihong Yu | 2022-05-18 23:02:51 | Re: ALTER TABLE SET ACCESS METHOD on partitioned tables |
Previous Message | Joe Conway | 2022-05-18 20:49:24 | Re: Limiting memory allocation |