From: | Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp> |
---|---|
To: | david(dot)rowley(at)2ndquadrant(dot)com |
Cc: | Langote_Amit_f8(at)lab(dot)ntt(dot)co(dot)jp, sulamul(at)gmail(dot)com, robertmhaas(at)gmail(dot)com, dilipbalaut(at)gmail(dot)com, rajkumar(dot)raghuwanshi(at)enterprisedb(dot)com, memissemerson(at)gmail(dot)com, pgsql-hackers(at)postgresql(dot)org |
Subject: | Re: [HACKERS] path toward faster partition pruning |
Date: | 2017-11-16 02:54:23 |
Message-ID: | 20171116.115423.140952026.horiguchi.kyotaro@lab.ntt.co.jp |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Thanks for the interesting test, David.
At Tue, 14 Nov 2017 17:00:12 +1300, David Rowley <david(dot)rowley(at)2ndquadrant(dot)com> wrote in <CAKJS1f-1Lc_b=Y1iicPQzvUgSn1keHSgmRqLuOGq_VR6M==zbw(at)mail(dot)gmail(dot)com>
> On 13 November 2017 at 22:46, Amit Langote
> <Langote_Amit_f8(at)lab(dot)ntt(dot)co(dot)jp> wrote:
> > On 2017/11/10 12:30, Kyotaro HORIGUCHI wrote:
> >> Without some aggressive optimization, the loop takes most of the
> >> time to check-and-jump for nothing especially with many
> >> partitions and somewhat unintuitive.
> >>
> >> The following uses a bit tricky bitmap operation but
> >> is straightforward as a whole.
> >>
> >> =====
> >> /* fill the bits upper from BITNUM(lower) (0-based) of the first word */
> >> a->workds[wordnum++] += ~(bitmapword)((1 << BITNUM(lower)) - 1);
'+='.. ^^;
> >> /* fill up intermediate words */
> >> while (wordnum < uwordnum)
> >> a->words[wordnum++] = ~(bitmapword) 0;
> >>
> >> /* fill up to BITNUM(upper) bit (0-based) of the last word */
> >> a->workds[wordnum++] |=
> >> (~(bitmapword) 0) >> (BITS_PER_BITMAPWORD - (BITNUM(upper) - 1));
> >> =====
> >
> > Considering also the David's comment downthread, I will try to incorporate
> > this into bms_add_range().
>
> I've attached an implementation of the patch using this method.
>
> I've also attached bitset.c which runs each through their paces. I'd
> have expected Kyotaro's method to be faster, but gcc 7.2 with -O2
> generates very slightly slower code. I didn't really check why. clang
> seems to do a better job with it.
..
> $ gcc -O2 bitset.c -o bitset && ./bitset
> bms_add_range in 0.694254 (6.94254 ns per loop)
> bms_add_range2 in 0.726643 (7.26643 ns per loop)
..
> $ gcc --version
> gcc (Ubuntu 7.2.0-8ubuntu3) 7.2.0
> Copyright (C) 2017 Free Software Foundation, Inc.
> This is free software; see the source for copying conditions. There is NO
> warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
Hmm bms_add_range doesn't seem getting so aggressive optimization
but I had a similar result.
Looking the output of gcc -S, I found that bms_add_range() is
embedded in main(). (gcc 7.1.0) It's not surprizing after finding
that but.. Anyway I added __attribute((noinline)) to the two
functions and got the following result.
> bms_add_range in 1.24 (12.4 ns per loop)
> bms_add_range2 in 0.8 (8 ns per loop)
It seems reasonable.
> $ clang -O2 bitset.c -o bitset && ./bitset
> bms_add_range in 0.866554 (8.66554 ns per loop)
> bms_add_range2 in 0.467138 (4.67138 ns per loop)
..
> $ clang --version
> clang version 4.0.1-6 (tags/RELEASE_401/final)
> Target: x86_64-pc-linux-gnu
> Thread model: posix
> InstalledDir: /usr/bin
>
> Probably just go with Kyotaro's idea (v2). I don't think this is worth
> debating, I just wanted to show it's not that clear-cut.
I agree that it's not so clear-cut.
regard,
--
Kyotaro Horiguchi
NTT Open Source Software Center
From | Date | Subject | |
---|---|---|---|
Next Message | Michael Paquier | 2017-11-16 03:09:05 | Re: [HACKERS] Timeline ID in backup_label file |
Previous Message | Tom Lane | 2017-11-16 02:52:21 | Re: Further simplification of c.h's #include section |