From: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
---|---|
To: | Heikki Linnakangas <hlinnaka(at)iki(dot)fi> |
Cc: | Peter Eisentraut <peter(at)eisentraut(dot)org>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Simplify newNode() |
Date: | 2023-12-14 22:44:14 |
Message-ID: | 2872058.1702593854@sss.pgh.pa.us |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Heikki Linnakangas <hlinnaka(at)iki(dot)fi> writes:
> On 14/12/2023 10:32, Peter Eisentraut wrote:
>> But if we think that compilers are now smart enough, maybe we can unwind
>> this whole stack a bit more? Maybe we don't need MemSetTest() and/or
>> palloc0fast() and/or newNode() at all?
> Good point. Looking closer, modern compilers will actually turn the
> MemSetLoop() in MemoryContextAllocZeroAligned() into a call to memset()
> anyway! Funny. That is true for recent versions of gcc, clang, and MSVC.
I experimented with the same planner-intensive test case that I used
in the last discussion back in 2008. I got these results:
HEAD:
tps = 144.974195 (without initial connection time)
v1 patch:
tps = 146.302004 (without initial connection time)
v2 patch:
tps = 144.882208 (without initial connection time)
While there's not much daylight between these numbers, the times are
quite reproducible for me. This is with RHEL8's gcc 8.5.0 on x86_64.
That's probably a bit trailing-edge in terms of what people might be
using with v17, but I don't think it's discountable.
I also looked at the backend's overall code size per size(1):
HEAD:
text data bss dec hex filename
8613007 100192 220176 8933375 884fff testversion.stock/bin/postgres
v1 patch:
text data bss dec hex filename
8615126 100192 220144 8935462 885826 testversion.v1/bin/postgres
v2 patch:
text data bss dec hex filename
8595322 100192 220144 8915658 880aca testversion.v2/bin/postgres
I did check that the v1 patch successfully inlines newNode() and
reduces it to just a MemoryContextAllocZeroAligned call, so it's
correct that modern compilers do that better than whatever I tested
in 2008. But I wonder what is happening in v2 to reduce the code
size so much. MemoryContextAllocZeroAligned is not 20kB by itself.
> Good point. Looking closer, modern compilers will actually turn the
> MemSetLoop() in MemoryContextAllocZeroAligned() into a call to memset()
> anyway! Funny. That is true for recent versions of gcc, clang, and MSVC.
Not here ...
> Yeah, +1 on removing all that, including MemoryContextAllocZeroAligned.
> It's not doing any good as it is, as it gets compiled to be identical to
> MemoryContextAllocZero.
Also not so here. Admittedly, my results don't make much of a case
for keeping the two code paths, even on compilers where it matters.
regards, tom lane
From | Date | Subject | |
---|---|---|---|
Next Message | Thomas Munro | 2023-12-14 22:56:25 | Re: Simplify newNode() |
Previous Message | Tom Lane | 2023-12-14 21:47:05 | Re: useless LIMIT_OPTION_DEFAULT |