From: | Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us> |
---|---|
To: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
Cc: | PostgreSQL-patches <pgsql-patches(at)postgresql(dot)org> |
Subject: | Re: MemSet inline for newNode |
Date: | 2002-11-11 02:58:07 |
Message-ID: | 200211110258.gAB2w7C03559@candle.pha.pa.us |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers pgsql-patches |
Tom Lane wrote:
> Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us> writes:
> > I thought someone had tested that there was little/no performance
> > difference between these two statements:
> > MemSet(ptr, 0, 256)
> > vs.
> > i = 256;
> > MemSet(ptr, 0, i)
>
> But with any reasonably smart compiler, those *will* be the same because
> the compiler can do constant-folding (ie, it knows i is 256 when control
> reaches the MemSet). Pushing the MemSet into MemoryContextAlloc
> eliminates the possibility of knowing the size argument's value.
Yes, however, I am not seeing that getting optimized with gcc 2.95 -O2.
> > I can back out my changes, but it would be easier to see if there is a
> > benefit before doing that. Personally, I am going to be surprised if a
> > single conditional tests in MemSet (which is not in the assignment loop)
> > will make any measurable difference.
>
> Then why were we bothering? IIRC, this was being sold as a performance
> improvement.
Well, the palloc0 use by newNode was a performance boost, as tested by
Neil Conway. Without palloc0, there was no way to inline newNode. He
found in his tests that the palloc0 version of newNode was as fast as
his inline newNode, and it didn't cause the same code bloat.
The palloc0's used elsewhere in the code was merely for code clarity
and to reduce code bloat caused by MemSet in a few cases. I will back
it out while I do some more tests.
One thing I am concerned about is that newNode _isn't_ making use of a
constant len for MemSet, but doing it inline was causing too much bloat.
I made a new version that did the alignment test of start and len in one
pass:
if ((( ((long) _start) | _len) & INT_ALIGN_MASK) == 0 && \
_val == 0 && \
_len <= MEMSET_LOOP_LIMIT) \
I have found that the last test is the one that slows it down.
--
Bruce Momjian | http://candle.pha.pa.us
pgman(at)candle(dot)pha(dot)pa(dot)us | (610) 359-1001
+ If your life is a hard drive, | 13 Roberts Road
+ Christ can be your backup. | Newtown Square, Pennsylvania 19073
From | Date | Subject | |
---|---|---|---|
Next Message | Tom Lane | 2002-11-11 03:10:32 | Re: MemSet inline for newNode |
Previous Message | Tom Lane | 2002-11-11 02:50:03 | Re: MemSet inline for newNode |
From | Date | Subject | |
---|---|---|---|
Next Message | Tom Lane | 2002-11-11 03:10:32 | Re: MemSet inline for newNode |
Previous Message | Tom Lane | 2002-11-11 02:50:03 | Re: MemSet inline for newNode |