From: | Jeremy Kerr <jk(at)ozlabs(dot)org> |
---|---|
To: | Robert Haas <robertmhaas(at)gmail(dot)com> |
Cc: | pgsql-hackers(at)postgresql(dot)org |
Subject: | Re: [PATCH v3] Avoid manual shift-and-test logic in AllocSetFreeIndex |
Date: | 2009-07-20 00:47:18 |
Message-ID: | 200907201047.18603.jk@ozlabs.org |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Hi Robert,
> That having been said, Jeremy, you probably want to take a look at
> those comments and I have a few responses to them as well.
OK, thanks for the heads-up.
> following comment:
> > Applied and built cleanly. Regress passes. Trying to hunt down ppc
> > box to see if performance enhancement can be seen.
>
> Question: are we only doing this because of PowerPC?
No, this patch will benefit any architecture that has a gcc
implementation of __builtin_clz. I know that both x86 and powerpc gcc
support this.
> What is going to happen on x86 and other architectures? x86 is not
> the center of the universe, but at a very minimum we need to make sure
> that things don't go backwards on what is a very common platform. Has
> anyone done any benchmarking of this?
Yes, Atsushi Ogawa did some benchmarking with this patch on x86, his
numbers are here:
http://archives.postgresql.org/message-id/4A2895C4.9050108@hi-ho.ne.jp
In fact, Atushi originally submitted a patch using inline asm (using
"bsr") to do this on x86. Coincidentally, I was working on a powerpc
implementation (using "cntlz") at the same time, so submitted a patch
using the gcc builtin that would work on both (and possibly other)
architectures.
> A related question: the original email for this patch says that it
> results in a performance increase of about 2% on PPC. But since it
> gives no details on exactly what the test was that improved by 2%,
> it's hard to judge the impact of this. If this means that
> AllocSetFreeIndex() is 2% faster, I think we should reject this patch
> and move on. It's not worth introducing architecture-specific code
> to get a performance improvement that probably won't even move the
> needle on a macro-benchmark. On the other hand, if the test was
> something like a pgbench run, then this is really very impressive.
> But we don't know which it is.
The 2% improvement I saw is for a sysbench OLTP run. I'm happy to re-do
the run and report specific numbers if you need them.
> Zdenek Kotala added this comment:
> > I have several objections:
> >
> > - inline is forbidden to use in PostgreSQL - you need exception or
> > do it differently
> >
> > - type mismatch - Size vs unsigned int vs 32. you should use only
> > Size and sizeof(Size)
OK, happy to make these changes. What's the commitfest process here,
should I redo the patch and re-send to -hackers?
(inline again: should I just make this a static, the compiler can inline
where possible? or do you want a macro?)
> > And general comment:
> >
> > Do we want to have this kind of code which is optimized for one
> > compiler and platform.
One compiler, multiple platforms.
> > See openSSL or MySQL, they have many
> > optimizations which work fine on one platform with specific version
> > of compiler and specific version of OS. But if you select different
> > version of compiler or different compiler you can get very
> > different performance result (e.g. MySQL 30% degradation, OpenSSL
> > RSA 3x faster and so on).
I don't think we're going to see a lot of variation here (besides the
difference where __builtin_clz isn't available). Given that this is
implemented with a couple of instructions, I'm confident that we won't
see any degradation for platforms that support __builtin_clz. For the
other cases, we just use the exiting while-loop algorithm so there
should be no change (unless we mess up the inlining...).
> > I think in this case, it makes sense to have optimization here, but
> > we should do it like spinlocks are implemented and put this code
> > into /port.
Unless I'm missing something, spinlocks are done the same way - we have
one file, src/include/storage/s_lock.h, which is mostly different
implementations of the lock primitives for different platforms,
separated by preprocessor tests.
Essentially, this is just one line of code, protected by
HAVE_BUILTIN_CLZ (which is a feature-test, rather than a specific
platform or compiler test), and is only used in one place. I don't think
that warrants a separate file.
Cheers,
Jeremy
From | Date | Subject | |
---|---|---|---|
Next Message | Andrew Dunstan | 2009-07-20 00:49:20 | Re: navigation menu for documents |
Previous Message | Tom Lane | 2009-07-20 00:42:40 | Re: [PATCH 1/2 v3] [libpq] rework sigpipe-handling macros |