From: | Ranier Vilela <ranier(dot)vf(at)gmail(dot)com> |
---|---|
To: | David Rowley <dgrowleyml(at)gmail(dot)com> |
Cc: | Peter Eisentraut <peter(dot)eisentraut(at)enterprisedb(dot)com>, Justin Pryzby <pryzby(at)telsasoft(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Avoid unecessary MemSet call (src/backend/utils/cache/relcache.c) |
Date: | 2022-05-18 23:47:14 |
Message-ID: | CAEudQApnjRsGvqUrQhpJ=MXsqBHtaVk6qYMoqhv+FDeORhgQeA@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Em qua., 18 de mai. de 2022 às 19:57, David Rowley <dgrowleyml(at)gmail(dot)com>
escreveu:
> On Thu, 19 May 2022 at 02:08, Ranier Vilela <ranier(dot)vf(at)gmail(dot)com> wrote:
> > That would initialize the content at compilation and not at runtime,
> correct?
>
> Your mental model of compilation and run-time might be flawed here.
> Here's no such thing as zeroing memory at compile time. There's only
> emitting instructions that perform those tasks at run-time.
> https://godbolt.org/ might help your understanding.
>
> > There are a lot of cases using MemSet (with struct variables) and at
> Windows 64 bits, long are 4 (four) bytes.
> > So I believe that MemSet is less efficient on Windows than on Linux.
> > "The size of the '_vstart' buffer is not a multiple of the element size
> of the type 'long'."
> > message from PVS-Studio static analysis tool.
>
> I've been wondering for a while if we really need to have the MemSet()
> macro. I see it was added in 8cb415449 (1997). I think compilers have
> evolved quite a bit in the past 25 years, so it could be time to
> revisit that.
>
+1
All compilers currently have memset optimized.
> Your comment on the sizeof(long) on win64 is certainly true. I wrote
> the attached C program to test the performance difference.
>
> (windows 64-bit)
> >cl memset.c /Ox
> >memset 200000000
> Running 200000000 loops
> MemSet: size 8: 1.833000 seconds
> MemSet: size 16: 1.841000 seconds
> MemSet: size 32: 1.838000 seconds
> MemSet: size 64: 1.851000 seconds
> MemSet: size 128: 3.228000 seconds
> MemSet: size 256: 5.278000 seconds
> MemSet: size 512: 3.943000 seconds
> memset: size 8: 0.065000 seconds
> memset: size 16: 0.131000 seconds
> memset: size 32: 0.262000 seconds
> memset: size 64: 0.530000 seconds
> memset: size 128: 1.169000 seconds
> memset: size 256: 2.950000 seconds
> memset: size 512: 3.191000 seconds
>
> It seems like there's no cases there where MemSet is faster than
> memset. I was careful to only provide MemSet() with inputs that
> result in it not using the memset fallback. I also provided constants
> so that the decision about which method to use was known at compile
> time.
>
> It's not clear to me why 512 is faster than 256.
Probably broken alignment with 256?
Another warning from PVS-Studio:
[1] "The pointer '_start' is cast to a more strictly aligned pointer type."
src/contrib/postgres_fdw/connection.c (Line 1690)
MemSet(values, 0, sizeof(values));
> I saw the same on a repeat run.
>
> Changing "long" to "long long" it looks like:
>
> >memset 200000000
> Running 200000000 loops
> MemSet: size 8: 0.066000 seconds
> MemSet: size 16: 1.978000 seconds
> MemSet: size 32: 1.982000 seconds
> MemSet: size 64: 1.973000 seconds
> MemSet: size 128: 1.970000 seconds
> MemSet: size 256: 3.225000 seconds
> MemSet: size 512: 5.366000 seconds
> memset: size 8: 0.069000 seconds
> memset: size 16: 0.132000 seconds
> memset: size 32: 0.265000 seconds
> memset: size 64: 0.527000 seconds
> memset: size 128: 1.161000 seconds
> memset: size 256: 2.976000 seconds
> memset: size 512: 3.179000 seconds
>
> The situation is a little different on my Linux machine:
>
> $ gcc memset.c -o memset -O2
> $ ./memset 200000000
> Running 200000000 loops
> MemSet: size 8: 0.000002 seconds
> MemSet: size 16: 0.000000 seconds
> MemSet: size 32: 0.094041 seconds
> MemSet: size 64: 0.184618 seconds
> MemSet: size 128: 1.781503 seconds
> MemSet: size 256: 2.547910 seconds
> MemSet: size 512: 4.005173 seconds
> memset: size 8: 0.046156 seconds
> memset: size 16: 0.046123 seconds
> memset: size 32: 0.092291 seconds
> memset: size 64: 0.184509 seconds
> memset: size 128: 1.781518 seconds
> memset: size 256: 2.577104 seconds
> memset: size 512: 4.004757 seconds
>
> It looks like part of the work might be getting optimised away in the
> 8-16 MemSet() calls.
>
On linux (long) have 8 bytes.
I'm still surprised that MemSet (8/16) is faster.
> clang seems to have the opposite for size 8.
>
> $ clang memset.c -o memset -O2
> $ ./memset 200000000
> Running 200000000 loops
> MemSet: size 8: 0.007653 seconds
> MemSet: size 16: 0.005771 seconds
> MemSet: size 32: 0.011539 seconds
> MemSet: size 64: 0.023095 seconds
> MemSet: size 128: 0.046130 seconds
> MemSet: size 256: 0.092269 seconds
> MemSet: size 512: 0.968564 seconds
> memset: size 8: 0.000000 seconds
> memset: size 16: 0.005776 seconds
> memset: size 32: 0.011559 seconds
> memset: size 64: 0.023069 seconds
> memset: size 128: 0.046129 seconds
> memset: size 256: 0.092243 seconds
> memset: size 512: 0.968534 seconds
>
> There does not seem to be any significant reduction in the size of the
> binary from changing the MemSet macro to directly use memset. It went
> from 9865008 bytes down to 9860800 bytes (4208 bytes less).
>
Anyway I think on Windows 64 bits,
it is very worthwhile to remove the MemSet macro.
regards,
Ranier Vilela
From | Date | Subject | |
---|---|---|---|
Next Message | Ranier Vilela | 2022-05-18 23:51:01 | Re: Avoid unecessary MemSet call (src/backend/utils/cache/relcache.c) |
Previous Message | Tom Lane | 2022-05-18 23:20:13 | Re: Avoid unecessary MemSet call (src/backend/utils/cache/relcache.c) |