From: | Jesse Zhang <sbjesse(at)gmail(dot)com> |
---|---|
To: | David Fetter <david(at)fetter(dot)org> |
Cc: | PostgreSQL Development <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Use compiler intrinsics for bit ops in hash |
Date: | 2020-01-14 20:21:41 |
Message-ID: | CAGf+fX79VCypUUuJTbf8o1no3Sqay1tHKVsnYAEN8wvDp2_eqA@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Hi David,
On Tue, Jan 14, 2020 at 9:36 AM David Fetter <david(at)fetter(dot)org> wrote:
>
> Folks,
>
> The recent patch for distinct windowing aggregates contained a partial
> fix of the FIXME that didn't seem entirely right, so I extracted that
> part, changed it to use compiler intrinsics, and submit it here.
The changes in hash AM and SIMPLEHASH do look like a net positive
improvement. My biggest cringe might be in pg_bitutils:
> diff --git a/src/include/port/pg_bitutils.h b/src/include/port/pg_bitutils.h
> index 498e532308..cc9338da25 100644
> --- a/src/include/port/pg_bitutils.h
> +++ b/src/include/port/pg_bitutils.h
> @@ -145,4 +145,32 @@ pg_rotate_right32(uint32 word, int n)
> return (word >> n) | (word << (sizeof(word) * BITS_PER_BYTE - n));
> }
>
> +/* ceil(lg2(num)) */
> +static inline uint32
> +ceil_log2_32(uint32 num)
> +{
> + return pg_leftmost_one_pos32(num-1) + 1;
> +}
> +
> +static inline uint64
> +ceil_log2_64(uint64 num)
> +{
> + return pg_leftmost_one_pos64(num-1) + 1;
> +}
> +
> +/* calculate first power of 2 >= num
> + * per https://graphics.stanford.edu/~seander/bithacks.html#RoundUpPowerOf2
> + * using BSR where available */
> +static inline uint32
> +next_power_of_2_32(uint32 num)
> +{
> + return ((uint32) 1) << (pg_leftmost_one_pos32(num-1) + 1);
> +}
> +
> +static inline uint64
> +next_power_of_2_64(uint64 num)
> +{
> + return ((uint64) 1) << (pg_leftmost_one_pos64(num-1) + 1);
> +}
> +
> #endif /* PG_BITUTILS_H */
>
1. Is ceil_log2_64 dead code?
2. The new utilities added here (ceil_log2_32 and company,
next_power_of_2_32 and company) all require num > 1, but don't clearly
Assert (or at the very least document) so.
3. A couple of the callers can actively pass in an argument of 1, e.g.
from _hash_spareindex in hashutil.c, while some other callers are iffy
at best (simplehash.h maybe?)
4. It seems like you *really* would like an operation like LZCNT in x86
(first appearing in Haswell) that is well defined on zero input. ISTM
the alternatives are:
a) Special case 1. That seems straightforward, but the branching cost
on a seemingly unlikely condition seems to be a lot of performance
loss
b) Use architecture specific intrinsic (and possibly with CPUID
shenanigans) like __builtin_ia32_lzcnt_u64 on x86 and use the CLZ
intrinsic elsewhere. The CLZ GCC intrinsic seems to map to
instructions that are well defined on zero in most ISA's other than
x86, so maybe we can get away with special-casing x86?
Cheers,
Jesse
From | Date | Subject | |
---|---|---|---|
Next Message | Tom Lane | 2020-01-14 20:22:00 | Re: 12.1 not useable: clientlib fails after a dozen queries (GSSAPI ?) |
Previous Message | Peter Eisentraut | 2020-01-14 20:13:51 | Re: pause recovery if pitr target not reached |